zsh-workers
 help / color / mirror / code / Atom feed
* expr length "$val" returns the wrong length for values containing NULL (\\0)
@ 2015-12-10  1:52 D Gowers
  2015-12-10  3:56 ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 1 reply; 7+ messages in thread
From: D Gowers @ 2015-12-10  1:52 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

Test case:

v=$(printf foo\\0bar);expr length "$v";expr length $v

alternatively:

v=foo$'\0'bar;expr length "$v";expr length $v

In zsh, the values returned are  3 and 3.
In dash and zsh, the values returned are 6 and 6.

Both of those results are wrong, AFAICS (foo$'0'bar is 7 characters long).
But the zsh result is more severely wrong. I could understand the bash/dash
result, at least,  as 'NULL characters are not counted towards length'.

In any case, it is easily demonstrated that the string is not 3 characters
long, by running 'echo "$V"' or 'print "$v"' or 'echo ${#v}'

`zsh --version` = 'zsh 5.2 (x86_64-unknown-linux-gnu)'

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: expr length "$val" returns the wrong length for values containing NULL (\\0)
  2015-12-10  1:52 expr length "$val" returns the wrong length for values containing NULL (\\0) D Gowers
@ 2015-12-10  3:56 ` Nikolay Aleksandrovich Pavlov (ZyX)
  2015-12-10  4:18   ` D Gowers
  0 siblings, 1 reply; 7+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2015-12-10  3:56 UTC (permalink / raw)
  To: D Gowers, zsh-workers


10.12.2015, 04:52, "D Gowers" <finticemo@gmail.com>:
> Test case:
>
> v=$(printf foo\\0bar);expr length "$v";expr length $v
>
> alternatively:
>
> v=foo$'\0'bar;expr length "$v";expr length $v
>
> In zsh, the values returned are 3 and 3.
> In dash and zsh, the values returned are 6 and 6.
>
> Both of those results are wrong, AFAICS (foo$'0'bar is 7 characters long).
> But the zsh result is more severely wrong. I could understand the bash/dash
> result, at least, as 'NULL characters are not counted towards length'.

Both results are *right*. In both cases you ask the length of the string and you get it.

In dash (also posh, bash and busybox ash) zero byte is skipped when storing. So length of the $v *is* six. You may question whether it is right storing without zero byte, but the fact that all four shells have exactly the same behaviour makes me think this is part of the POSIX standard. In any case non-C strings are not on the list of features of these shells unlike zsh (it also internally uses C NUL-terminated strings, but zero bytes and some other characters are “metafied” (i.e. escaped) and unmetafied when passed to the outer world e.g. by doing `echo $v` to pass string to terminal).

As I said in zsh zero byte is stored. But C strings which are the only ones that can be arguments to any program are **NUL-terminated**. So what you do is passing string "foo" because NUL terminates the string. You cannot possibly get the answer you think is right here thus, unless you reimplement `expr` as a zsh function.

>
> In any case, it is easily demonstrated that the string is not 3 characters
> long, by running 'echo "$V"' or 'print "$v"' or 'echo ${#v}'
>
> `zsh --version` = 'zsh 5.2 (x86_64-unknown-linux-gnu)'


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: expr length "$val" returns the wrong length for values containing NULL (\\0)
  2015-12-10  3:56 ` Nikolay Aleksandrovich Pavlov (ZyX)
@ 2015-12-10  4:18   ` D Gowers
  2015-12-10  4:29     ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 1 reply; 7+ messages in thread
From: D Gowers @ 2015-12-10  4:18 UTC (permalink / raw)
  To: Nikolay Aleksandrovich Pavlov (ZyX); +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2178 bytes --]

Ah, okay. That (commandline arguments not being able to contain NUL)
seems.. a bit anachronistic. But I guess it's never been enough of a
problem to warrant the considerable bother to fix it. Fair enough.

On Thu, Dec 10, 2015 at 2:26 PM, Nikolay Aleksandrovich Pavlov (ZyX) <
kp-pav@yandex.ru> wrote:

>
> 10.12.2015, 04:52, "D Gowers" <finticemo@gmail.com>:
> > Test case:
> >
> > v=$(printf foo\\0bar);expr length "$v";expr length $v
> >
> > alternatively:
> >
> > v=foo$'\0'bar;expr length "$v";expr length $v
> >
> > In zsh, the values returned are 3 and 3.
> > In dash and zsh, the values returned are 6 and 6.
> >
> > Both of those results are wrong, AFAICS (foo$'0'bar is 7 characters
> long).
> > But the zsh result is more severely wrong. I could understand the
> bash/dash
> > result, at least, as 'NULL characters are not counted towards length'.
>
> Both results are *right*. In both cases you ask the length of the string
> and you get it.
>
> In dash (also posh, bash and busybox ash) zero byte is skipped when
> storing. So length of the $v *is* six. You may question whether it is right
> storing without zero byte, but the fact that all four shells have exactly
> the same behaviour makes me think this is part of the POSIX standard. In
> any case non-C strings are not on the list of features of these shells
> unlike zsh (it also internally uses C NUL-terminated strings, but zero
> bytes and some other characters are “metafied” (i.e. escaped) and
> unmetafied when passed to the outer world e.g. by doing `echo $v` to pass
> string to terminal).
>
> As I said in zsh zero byte is stored. But C strings which are the only
> ones that can be arguments to any program are **NUL-terminated**. So what
> you do is passing string "foo" because NUL terminates the string. You
> cannot possibly get the answer you think is right here thus, unless you
> reimplement `expr` as a zsh function.
>
> >
> > In any case, it is easily demonstrated that the string is not 3
> characters
> > long, by running 'echo "$V"' or 'print "$v"' or 'echo ${#v}'
> >
> > `zsh --version` = 'zsh 5.2 (x86_64-unknown-linux-gnu)'
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: expr length "$val" returns the wrong length for values containing NULL (\\0)
  2015-12-10  4:18   ` D Gowers
@ 2015-12-10  4:29     ` Nikolay Aleksandrovich Pavlov (ZyX)
  2015-12-10  5:00       ` D Gowers
  0 siblings, 1 reply; 7+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2015-12-10  4:29 UTC (permalink / raw)
  To: D Gowers; +Cc: zsh-workers

10.12.2015, 07:18, "D Gowers" <finticemo@gmail.com>:
> Ah, okay. That (commandline arguments not being able to contain NUL) seems.. a bit anachronistic. But I guess it's never been enough of a problem to warrant the considerable bother to fix it. Fair enough.

This has nothing to do with the commandline itself. In some very earlier days it was decided that strings will be NUL-terminated (in place of e.g. being structs with size_t size and char *data) and this statement sneaked into many parts of many standards. If you write C code you will have problems when dealing with NUL-terminated string because every library function that accepts something other then void* pointer with “generic data” assumes that string should terminate with NUL. Projects like zsh or almost every programming language have to write their own string implementations: in zsh it is C strings with escaped characters, in most other cases it is length+data pair.

Since one of the functions having NUL convention is exec* function family which is used to launch programs and another is main() function on the other side that accepts NUL-terminated strings you cannot really do anything to fix this: replacing one of the core conventions is *very* expensive, especially since you must do this in a backward-compatible way.

> On Thu, Dec 10, 2015 at 2:26 PM, Nikolay Aleksandrovich Pavlov (ZyX) <kp-pav@yandex.ru> wrote:
>> 10.12.2015, 04:52, "D Gowers" <finticemo@gmail.com>:
>>> Test case:
>>>
>>> v=$(printf foo\\0bar);expr length "$v";expr length $v
>>>
>>> alternatively:
>>>
>>> v=foo$'\0'bar;expr length "$v";expr length $v
>>>
>>> In zsh, the values returned are 3 and 3.
>>> In dash and zsh, the values returned are 6 and 6.
>>>
>>> Both of those results are wrong, AFAICS (foo$'0'bar is 7 characters long).
>>> But the zsh result is more severely wrong. I could understand the bash/dash
>>> result, at least, as 'NULL characters are not counted towards length'.
>>
>> Both results are *right*. In both cases you ask the length of the string and you get it.
>>
>> In dash (also posh, bash and busybox ash) zero byte is skipped when storing. So length of the $v *is* six. You may question whether it is right storing without zero byte, but the fact that all four shells have exactly the same behaviour makes me think this is part of the POSIX standard. In any case non-C strings are not on the list of features of these shells unlike zsh (it also internally uses C NUL-terminated strings, but zero bytes and some other characters are “metafied” (i.e. escaped) and unmetafied when passed to the outer world e.g. by doing `echo $v` to pass string to terminal).
>>
>> As I said in zsh zero byte is stored. But C strings which are the only ones that can be arguments to any program are **NUL-terminated**. So what you do is passing string "foo" because NUL terminates the string. You cannot possibly get the answer you think is right here thus, unless you reimplement `expr` as a zsh function.
>>
>>>
>>> In any case, it is easily demonstrated that the string is not 3 characters
>>> long, by running 'echo "$V"' or 'print "$v"' or 'echo ${#v}'
>>>
>>> `zsh --version` = 'zsh 5.2 (x86_64-unknown-linux-gnu)'


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: expr length "$val" returns the wrong length for values containing NULL (\\0)
  2015-12-10  4:29     ` Nikolay Aleksandrovich Pavlov (ZyX)
@ 2015-12-10  5:00       ` D Gowers
  2015-12-10  9:37         ` Peter Stephenson
  0 siblings, 1 reply; 7+ messages in thread
From: D Gowers @ 2015-12-10  5:00 UTC (permalink / raw)
  To: Nikolay Aleksandrovich Pavlov (ZyX); +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 3769 bytes --]

I am aware of the prevalence of NUL-terminated strings, since I've coded in
C in the past, that's why I wrote 'considerable bother to fix it'.
Nevertheless, for a purpose such as argument passing, size + data is
clearly better (easier to secure and more flexible)

On Thu, Dec 10, 2015 at 2:59 PM, Nikolay Aleksandrovich Pavlov (ZyX) <
kp-pav@yandex.ru> wrote:

> 10.12.2015, 07:18, "D Gowers" <finticemo@gmail.com>:
> > Ah, okay. That (commandline arguments not being able to contain NUL)
> seems.. a bit anachronistic. But I guess it's never been enough of a
> problem to warrant the considerable bother to fix it. Fair enough.
>
> This has nothing to do with the commandline itself. In some very earlier
> days it was decided that strings will be NUL-terminated (in place of e.g.
> being structs with size_t size and char *data) and this statement sneaked
> into many parts of many standards. If you write C code you will have
> problems when dealing with NUL-terminated string because every library
> function that accepts something other then void* pointer with “generic
> data” assumes that string should terminate with NUL. Projects like zsh or
> almost every programming language have to write their own string
> implementations: in zsh it is C strings with escaped characters, in most
> other cases it is length+data pair.
>
> Since one of the functions having NUL convention is exec* function family
> which is used to launch programs and another is main() function on the
> other side that accepts NUL-terminated strings you cannot really do
> anything to fix this: replacing one of the core conventions is *very*
> expensive, especially since you must do this in a backward-compatible way.
>
> > On Thu, Dec 10, 2015 at 2:26 PM, Nikolay Aleksandrovich Pavlov (ZyX) <
> kp-pav@yandex.ru> wrote:
> >> 10.12.2015, 04:52, "D Gowers" <finticemo@gmail.com>:
> >>> Test case:
> >>>
> >>> v=$(printf foo\\0bar);expr length "$v";expr length $v
> >>>
> >>> alternatively:
> >>>
> >>> v=foo$'\0'bar;expr length "$v";expr length $v
> >>>
> >>> In zsh, the values returned are 3 and 3.
> >>> In dash and zsh, the values returned are 6 and 6.
> >>>
> >>> Both of those results are wrong, AFAICS (foo$'0'bar is 7 characters
> long).
> >>> But the zsh result is more severely wrong. I could understand the
> bash/dash
> >>> result, at least, as 'NULL characters are not counted towards length'.
> >>
> >> Both results are *right*. In both cases you ask the length of the
> string and you get it.
> >>
> >> In dash (also posh, bash and busybox ash) zero byte is skipped when
> storing. So length of the $v *is* six. You may question whether it is right
> storing without zero byte, but the fact that all four shells have exactly
> the same behaviour makes me think this is part of the POSIX standard. In
> any case non-C strings are not on the list of features of these shells
> unlike zsh (it also internally uses C NUL-terminated strings, but zero
> bytes and some other characters are “metafied” (i.e. escaped) and
> unmetafied when passed to the outer world e.g. by doing `echo $v` to pass
> string to terminal).
> >>
> >> As I said in zsh zero byte is stored. But C strings which are the only
> ones that can be arguments to any program are **NUL-terminated**. So what
> you do is passing string "foo" because NUL terminates the string. You
> cannot possibly get the answer you think is right here thus, unless you
> reimplement `expr` as a zsh function.
> >>
> >>>
> >>> In any case, it is easily demonstrated that the string is not 3
> characters
> >>> long, by running 'echo "$V"' or 'print "$v"' or 'echo ${#v}'
> >>>
> >>> `zsh --version` = 'zsh 5.2 (x86_64-unknown-linux-gnu)'
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: expr length "$val" returns the wrong length for values containing NULL (\\0)
  2015-12-10  5:00       ` D Gowers
@ 2015-12-10  9:37         ` Peter Stephenson
  2015-12-10 17:47           ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Stephenson @ 2015-12-10  9:37 UTC (permalink / raw)
  To: zsh-workers

On Thu, 10 Dec 2015 15:30:03 +1030
D Gowers <finticemo@gmail.com> wrote:
> I am aware of the prevalence of NUL-terminated strings, since I've coded in
> C in the past, that's why I wrote 'considerable bother to fix it'.
> Nevertheless, for a purpose such as argument passing, size + data is
> clearly better (easier to secure and more flexible)

The main point here --- which doesn't seem to have been mentioned --- is
that expr isn't a shell builtin.  Within the shell, we do indeed treat
NUL characters as normal chracters.  As soon as you pass them outside,
you are stuck --- there's no mechanism nor even convention for passing
embedded NULLs which would require a rethink about the standard
library conventions. This is a problem, but not a shell problem.
So if you want to continue the argument, you'll need to find some
higher power mailing list.

pws


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: expr length "$val" returns the wrong length for values containing NULL (\\0)
  2015-12-10  9:37         ` Peter Stephenson
@ 2015-12-10 17:47           ` Nikolay Aleksandrovich Pavlov (ZyX)
  0 siblings, 0 replies; 7+ messages in thread
From: Nikolay Aleksandrovich Pavlov (ZyX) @ 2015-12-10 17:47 UTC (permalink / raw)
  To: Peter Stephenson, zsh-workers

10.12.2015, 12:38, "Peter Stephenson" <p.stephenson@samsung.com>:
> On Thu, 10 Dec 2015 15:30:03 +1030
> D Gowers <finticemo@gmail.com> wrote:
>>  I am aware of the prevalence of NUL-terminated strings, since I've coded in
>>  C in the past, that's why I wrote 'considerable bother to fix it'.
>>  Nevertheless, for a purpose such as argument passing, size + data is
>>  clearly better (easier to secure and more flexible)
>
> The main point here --- which doesn't seem to have been mentioned --- is
> that expr isn't a shell builtin. Within the shell, we do indeed treat

It was not directly mentioned, but I did say that you need to write `expr` as a zsh function for this code to work as expected.

> NUL characters as normal chracters. As soon as you pass them outside,
> you are stuck --- there's no mechanism nor even convention for passing

There actually is: use file descriptors in one way (pipes) or the other (temporary files, fd will be created by the program). This is most universal, and many programs work with data with NULs (e.g. `grep --null` with `xargs -0`). Still not everything supports this variant though. Most other variants I know include escaping/quoting of some sort and are much more ad-hoc.

> embedded NULLs which would require a rethink about the standard
> library conventions. This is a problem, but not a shell problem.
> So if you want to continue the argument, you'll need to find some
> higher power mailing list.
>
> pws


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-10 17:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10  1:52 expr length "$val" returns the wrong length for values containing NULL (\\0) D Gowers
2015-12-10  3:56 ` Nikolay Aleksandrovich Pavlov (ZyX)
2015-12-10  4:18   ` D Gowers
2015-12-10  4:29     ` Nikolay Aleksandrovich Pavlov (ZyX)
2015-12-10  5:00       ` D Gowers
2015-12-10  9:37         ` Peter Stephenson
2015-12-10 17:47           ` Nikolay Aleksandrovich Pavlov (ZyX)

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).