Re: [PATCH] zsh/random module [UPDATED]

zsh-workers
 help / color / mirror / code / Atom feed

From: Clinton Bunch <cdb_zsh@zentaur.org>
To: zsh-workers@zsh.org
Subject: Re: [PATCH] zsh/random module [UPDATED]
Date: Wed, 23 Nov 2022 20:58:36 -0600	[thread overview]
Message-ID: <4e500f9c-48ef-e1eb-ed7c-5895bd5473ab@zentaur.org> (raw)
In-Reply-To: <20221123194658.GM27622@tarpaulin.shahaf.local2>

This is a from memory reply as I'm away from my git repository/code for 
the holiday. :)  So there may be a follow up.

On 11/23/2022 1:46 PM, Daniel Shahaf wrote:
> Clinton Bunch wrote on Mon, Nov 07, 2022 at 18:18:15 -0600:
> +++ b/Doc/Zsh/mod_random.yo
>> @@ -0,0 +1,67 @@
>> +startitem()
>> +findex(getrandom)
>> +cindex(random data, printing, array)
>> +xitem(tt(getrandom) [ tt(-c) var(count) ] [ tt(-r) ] [ tt(-s) var(scalar) ] [tt(-i) ] [ tt(-a) var(array) ] [ tt(-L) var(lower_bound) ] [ tt(-U) var(upper_bound) ])
> Never use xitem() without item() after it.
I'm using existing docs as a template as I don't know YODL at all, so 
I'm not surprised I made mistakes.
>
> I don't understand this API.
>
> - Magic numbers:
>
>    + Why should -c default to 8 in the "random bytes" case?  Why
>      shouldn't it default to some other value, or even be required to be
>      specified explicitly by the user?
Not sure why 8 was the number I thought of, just that one wasn't 
particularly useful.
>
>    + Why should -c0 be an error?  It should be valid to request an array
>      of 0 random numbers, just like, say, «yes | head -n 0».
I thought nonsensical options must be an error.
>
>    + Why should -c65 be an error?  (Saw your comment elsethread that you
>      imagine that would suffice.  That's not a good reason for this limit.)
It started as a 256 byte limit from getrandom().
>
>    + Why should -L$(( 2**32 - 1)) be an error?  I should be able to request
>      a random integer in the range { x | 42 ≤ x ≤ 42 }.  It's perfectly
>      well-defined.
if the difference between lower and upper is zero, you get a divide by 
zero error unless you special case it, and it never occurred to me that 
someone would specify that case deliberately
>
> - Why should -c's default depend on anything at all?  You mentioned
>    downthread you consider making -c1 the default in all cases; that'd be
>    better.
>
> - The builtin can generate either integers or bytes.  Most of the API
>    (the type of "returned random data" that -c deals with, the default
>    number of those, validity of -L/-U, etc.) is split right along this
>    line.  Shouldn't this become two separate builtins, then?  One for
>    bytes and one for integers?
>
>    (More on this below, at the end of the implementation review.)
>
> - Interdependencies.  This can generate either bytes or unsigned ints,
>    and bytes can be emitted either in hex, raw, or in decimal, and either
>    to stdout or to a scalar or to an array… but not all of these
>    combinations are possible.  For instance, adding «-a» to «getrandom -c 1»
>    changes the output encoding from hex to decimal AND the output channel
>    from stdout to $reply.  It seems better to avoid coupling things like
>    this.
>
>> +subsect(Parameters)
>> +
>> +startitem()
>> +vindex(SRANDOM)
>> +item(tt(SRANDOM)) (
>> +A random positive 32-bit integer between 0 and 4,294,967,295.  This parameter
>> +is read-only.
>> +)
>> +enditem()
>> +
>> +subsect(Math Functions)
>> +
>> +startitem()
>> +item(tt(zrandom+LPAR()RPAR())) (
> Doesn't this need a findex()?
>
> Do other math functions have index entries?  And if not (systell()
> doesn't), shouldn't they?
>
>> +++ b/Src/Modules/random.c
>> @@ -0,0 +1,675 @@
> There's a whole bunch of orthographic issues in this file (whitespace,
> punctuation, spelling, etc.).  I won't point them out now since there
> are bigger issues.
>
>> +#include <stdbool.h>
> C99-ism.  (zsh is written in C89.)
>
>> +/* buffer to pre-load integers for SRANDOM to lessen the context switches */
>> +uint32_t rand_buff[8];
> C99-ism (the array element type).

I asked about POSIX standards as the dev guide seemed badly out of date 
in workers/50819

Got this response in workers/50847


On Tue, Oct 25, 2022 at 6:52 PM Clinton Bunch <cdb_zsh@zentaur.org> wrote:
 >
 > My personal opinion is that development should use at least the
 > POSIX-1.2001 standard.  It's 20 years old.  That's surely old enough for
 > any system still running.  (It's  certainly old enough that any such
 > system is not supported by the vendor)

OTOH any system not supported by the vendor is exactly one where
somebody might be trying to build their own zsh binary.

That said, I thought we standardized on c99 rather than c89 quite some 
time ago.

>
> Even in C99, wouldn't this require an explicit #include?  I guess it
> works now due to transitive #include's, but I don't know that it's
> guaranteed to work everywhere.
>
>> +getrandom_buffer(void *buf, size_t len)
>> +{
> ⋮
>> +    do {
>> +#ifdef HAVE_ARC4RANDOM
>> +	/* Apparently, the creaters of arc4random didn't believe in errors */
> Leave out the /ad homines/.
>
> FWIW, I would sooner guess arc4random() was originally written on
> a platform where it /couldn't/ error.
>
>> +	arc4random_buf(buf,len);
>> +	ret = len;
>> +#elif defined(HAVE_GETRANDOM)
>> +	ret=getrandom(bufptr,(len - val),0);
>> +#else
>> +	ret=read(randfd,bufptr,(len - val));
> The premise of the module is to offer an interface to high-quality
> random numbers.  Is /dev/urandom, which is the module falls back to
> here, necessarily of sufficient quality?  If not, shouldn't there be
> a way for the module's user to determine the source of randomness?
arc4random and getrandom are just more efficient access to the pool of 
/dev/urandom.   AFAIK it's as good a pseudo-random source as any on any 
platform.
>
>> +#endif
>> +	if (ret < 0) {
>> +	    if (errno != EINTR || errflag || retflag || breaks || contflag) {
>> +		zwarn("Unable to get random data: %e.", errno);
> Need to set «errno = 0» first, surely?
>
>> +/*
>> + * Implements the getrandom builtin to return a string of random bytes of
>> + * user-specified length.  It either prints them to stdout or returns them
>> + * in a parameter passed on the command line.
>> + *
>> + */
>> +
>> +/**/
>> +static int
>> +bin_getrandom(char *name, char **argv, Options ops, int func)
>> +{
> ⋮
>> +    bool integer_out = false;	 /*  boolean */
> The target audience for comments knows C.  So, nuke this comment.
>
> You might define inline enums here (for ints/bytes, scalar/array/stdout,
> and decimal/hex/raw) to make the code more self-documenting.
>
> Incidentally, this would also immediately raise the question of why some
> combinations of these aren't valid (e.g., hex or raw bytes output to an
> array).
>
>> +    int i, j;		         /* loop counters */
>> +
>> +    /*maximum string length of a 32-bit integer + null */
>> +    char int2str[11];
> Have the comment point out that this doesn't include a sign byte?
>
> FWIW, we have a 64-bit version of this in DIGBUFSIZE.
>
>> +    /* Vailues for -U *nd -L */
>> +    zulong   upper = UINT32_MAX, lower = 0;
>> +    uint32_t diff  = UINT32_MAX;
>> +    uint32_t min   = 0, rej = 0;  /* Reject below min and count */
> «rej» is never read.
>
>> +    bool     bound = false;       /* set if upper or lower bound specified */
> Why is this variable needed?  Having it at all means that passing «-L 0
> -U $((2**32-1))» is different to not passing any -L or -U option at all,
> even though these values are ostensibly the defaults.
>
>> +    if (OPT_ISSET(ops, 'U')) {
>> +	if (!zstrtoul_underscore(OPT_ARG(ops, 'U'), &upper)) {
>> +	    zwarnnam(name, "Couldn't convert -U %s to a number.",
>> +		    OPT_ARG(ops, 'U'));
>> +	    return 1;
>> +	} else if (upper > UINT32_MAX || upper < 1) {
>> +	    zwarnnam(name,
>> +		    "Upper bound must be between 1 and %z you specified: %z.",
>> +		     (zlong) UINT32_MAX, (zlong) upper);
> If «upper > ZLONG_MAX», the downcast to zlong would result in
> a negative number.
zwarnnam has no zulong format and upper is supposed to be a 32-bit integer.
>
>> +	    return 1;
> ⋮
>> +    if (bound) {
> ⋮
>> +	if (lower > upper) {
>> +	    zwarnnam(name, "-U must be larger than -L.");
>> +	    return 1;
>> +	}
>> +
>> +	diff = upper == UINT32_MAX && lower == 0 ? UINT32_MAX :
>> +		upper - lower + 1;
> How can this possibly be right?  As written it means the meaning of
> «diff» with the default values of «upper»/«lower» is different to its
> meaning in literally every other case.
>
> Ah, I see.  Without special-casing the default values, explicitly
> passing «-L 0 -U $((2**32 - 1))» would incorrectly trigger the
> zwarnnam() just below, because «diff» is a uint32_t.  Thus, the
> special-casing is an incorrect fix to a real issue.
>
>> +	if(!diff) {
>> +	    zwarnnam(name, "Upper and Lower bounds cannot be the same.");
>> +	    return 1;
>> +	}
> This warning will never be printed.  [Why?  Because «diff» is either
> UINT32_MAX, which is true, or «(uint32_t)(zulong)(upper - lower + 1)»,
> which, taking into account the range checks on «upper» and «lower», can
> only work out to 0 if «upper == UINT32_MAX && lower == 0»… which is
> precisely the case that's special-cased in the assignment to «diff» just
> above.]
>
> I suppose it was intended to be printed for «getrandom -L 42 -U 42».  In
> practice, that incantation just prints 42.  As I mentioned in the API
> review, I think that's desired behaviour.  In either case, please add
> a test.
>
>> +    }
>> +
>> +    if(!OPT_ISSET(ops,'c'))
>> +	if ((!bound && !integer_out) || arrname)
>> +	len = 8;
> So one default value of -c is here and the other is up in the
> declaration of «len».  That's somewhat confusing.
>
>> +
>> +
>> +    if (!byte_len)
>> +	byte_len = len;
> The condition is always true.
>
> Didn't you get a compiler warning for this?
No.  Not in Developer's Studio, clang, gcc, or HP ANSI C.
>
>> +
>> +
>> +    if (bound) {
>> +	pad = len / 16 + 1;
>> +	byte_len = (len + pad) * sizeof(uint32_t);
> «pad» should be declared in this scope since it's not used elsewhere.
> Ditto «outbuff» below and others.
>
> Why «len / 16 + 1»?
I couldn't find any statistical information that would tell me how many 
numbers were likely to be rejected, so I guessed.
>> +    } else if (integer_out)
>> +	byte_len = len * sizeof(uint32_t);
>> +
>> +    if (byte_len > 256)
>> +	byte_len=256;
>> +
> Why not report an error to the user?
>
> ⋮
>> +    min = -diff % diff;
> See above about the special casing in the assignment to «diff».  I guess
> it affects this line.
>
> ⋮
>> +	sprintf(int2str, "%u", int_val);
> Hold on.  %u expects an «unsigned int» and «int_val» is a «uint32_t».
> So, this line is only correct when «unsigned int» happens to be a 32-bit
> type… while there's nothing stopping that type from being a 64-bit type,
> or even a 16-bit type, and in either case this line would read the wrong
> number of bytes off the varargs.
>
>> +	if (arrname) {
>> +	    array[i]=ztrdup(int2str);
>> +	} else if (scalar && len == 1) {
>> +	    setiparam(scalar, int_val);
>> +	} else {
>> +	    printf("%s ",int2str);
>> +	}
>> +    }
>> +    if (arrname) {
>> +	setaparam(arrname,array);
>> +    } else if (!scalar) {
>> +	printf("\n");
>> +    }
>> +
>> +
>> +    zfree(buf, byte_len);
>> +    return 0;
>> +}
>> +
> In the API docs review I proposed to split «getrandom» into two separate
> builtins.
>
> Having read this function, I reiterate that proposal.  This function,
> with all the if/then cases that try to keep track of which of the
> function's different modes of operation was requested, is difficult to
> read and to modify.
>
>> +
>> +/*
>> + * Provides for the SRANDOM parameter and returns an unsigned 32-bit random
>> + * integer.
>> + */
>> +
>> +/**/
>> +static zlong
>> +get_srandom(UNUSED(Param pm)) {
>> +
>> +    if(buf_cnt < 0) {
>> +	getrandom_buffer((void*) rand_buff,sizeof(rand_buff));
>> +	buf_cnt=7;
> Magic number.  You might want «sizeof(rand_buff) / sizeof(rand_buff[0]) - 1»…
>
> …but it'd be more idiomatic to leave that «- 1» out and then change the
> postfix decrement to a prefix decrement.  That would also be more
> consistent with the variable's name.  (As written, «buf_cnt == 0» means
> there's one more zlong in it; it's not a "count" but a "largest valid
> index".)
>
>> +    }
>> +    return rand_buff[buf_cnt--];
>> +}
>> +
>> +/*
>> + * Implements the mathfunc zrandom and returns a random floating-point
>> + * number between 0 and 1.  Does this by getting a random 32-bt integer
>> + * and dividing it by UINT32_MAX.
>> + *
> Presumably this means to guarantee a /uniformly-distributed/ random
> floating-point number in the given range.  So, why does "getting
> a random 32-bit integer and dividing it by UINT32_MAX" actually
> implement that guarantee?
>
> Oh, wait, never mind.  The comment is out of date.  Fix it.
>
>> + */
>> +
>> +/**/
>> +static mnumber
>> +math_zrandom(UNUSED(char *name), UNUSED(int argc), UNUSED(mnumber *argv),
>> +	     UNUSED(int id))
>> +{
>> +    mnumber ret;
>> +    double r;
>> +
>> +    r = random_real();
>> +    if (r < 0) {
>> +	zwarnnam(name, "Failed to get sufficient random data.");
> Say why.  (Probably via errno?)
>
> Rule of thumb: every error message should include at least one replaceable.
>
> (Further instances of this elsewhere in the file.)
>
>> +/**/
>> +int
>> +setup_(UNUSED(Module m))
>> +{
>> +#ifdef USE_URANDOM
>> +    /* Check for the existence of /dev/urandom */
>> +
>> +    struct stat st;
>> +
>> +    if (lstat("/dev/urandom",&st) < 0) {
> Why not stat()?
Is it appropriate for /dev/urandom to be a symlink?
>
>> +	zwarn("No kernel random pool found.");
>> +	return 1;
>> +    }
>> +
>> +    if (!(S_ISCHR(st.st_mode)) ) {
>> +	zwarn("No kernel random pool found.");
>> +	return 1;
>> +    }
>> +#endif /* USE_URANDOM */
>> +    return 0;
>> +}
>> +/**/
>> +int
>> +_zclz64(uint64_t x) {
> ⋮
>> +    if (!(x & 0xFFFFFFFF00000000)) {
> Don't you need ZLONG_CONST() here?
>
>> +/**/
>> +uint64_t
>> +random64(void) {
>> +    uint64_t r;
>> +
>> +    if(getrandom_buffer(&r,sizeof(r)) < 0) {
>> +	zwarn("zsh/random: Can't get sufficient random data.");
>> +	return 1; /* 0 will cause loop */
> Where is the docstring of random64() itself?  /That/ is what determines
> whether or not returning 0 would be correct.
>
> What loops and why?
>
> Also, I suspect this name will collide with a third-party function one
> day.  Recommending to rename.
>
>> +    }
>> +
>> +    return r;
>> +}
>> +
>> +/* Following code is under the below copyright, despite changes for error
>> + * handling and removing GCCisms */
>> +
> -1.  Don't say "Following code"; identify the code explicitly, or better
> yet, move it to its own *.c file.  Otherwise, any function we might
> append to this file would be taken to be covered by this copyright
> notice and license statement.
>
>> +/*
>> + * random_real: Generate a stream of bits uniformly at random and
>> + * interpret it as the fractional part of the binary expansion of a
>> + * number in [0, 1], 0.00001010011111010100...; then round it.
>> + */
>> +
>> +/**/
>> +double
> Need «static».
>
> (Further instances of this elsewhere in the file.)
>
>> +random_real(void)
>> +{
> ⋮
>> +}
>> +++ b/Test/V15random.ztst
>> @@ -0,0 +1,51 @@
>> +# Tests for the zsh/random module
>> +%test
>> +  getrandom -U 56
>> +1f:Checking if system has a kernel random source
>> +?(eval):1: No kernel random pool found.
>> +
> The test point's code has nothing to do with the test point's
> description and expected errput.
>
> The 'f' flag is why this test point's failure doesn't get reported.
The test is supposed to fail.
>
>> +  getrandom -U 64
>> +0:Checking upper Bound is observed
>> +*><0-64>
>> +
>> +  tmpmsg="failed"
>> +  getrandom -L 4 -U 5 -c 16 -a tmpa
>> +  for i in ${tmpa[@]}
>> +  do
>> +    if (( i == 5 )); then
>> +      tmpmsg="success"
>> +      break
>> +    fi
>> +  done
> Use ${tmpa[(I)5]} ?
>
>> +  echo $tmpmsg
>> +0:Checking that upper bound is inclusive
>> +F:Try running the test again: make TESTNUM=V15 check
> Baby and bathwater.  Say what the problem is first and what the
> recommended fix is second.  In this case, something along the lines
> of "This test WILL false positive once every 65536 runs on average.  To
> rule this out, run the test again."
>
> Oh, and bump that 16 to something 3 or 4 times as big, because a 1/65536
> chance isn't really enough in a world where automated builds (CI,
> distros' QA, etc.) is a thing.
>
>> +>success
>> +
>> +  echo $(( zrandom() ))
>> +0:Checking if zrandom() produces a floating-point number between 0 and 1
>> +*>0(|.)[[:digit:]]#
> Why is the period optional?  Something like "042" shouldn't be accepted.
>
>> diff --git a/configure.ac b/configure.ac
>> index 074141d38..f1fa01274 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -675,6 +675,7 @@ fi
>>   AC_CHECK_HEADERS(sys/time.h sys/times.h sys/select.h termcap.h termio.h \
>>   		 termios.h sys/param.h sys/filio.h string.h memory.h \
>>   		 limits.h fcntl.h libc.h sys/utsname.h sys/resource.h \
>> +                 sys/random.h \
> Tabs/spaces are wrong.  Also in the next hunk.
>
>>   		 locale.h errno.h stdio.h stdarg.h varargs.h stdlib.h \
>>   		 unistd.h sys/capability.h \
>>   		 utmp.h utmpx.h sys/types.h pwd.h grp.h poll.h sys/mman.h \
>> @@ -1337,6 +1338,7 @@ AC_CHECK_FUNCS(strftime strptime mktime timelocal \
>>   	       cygwin_conv_path \
>>   	       nanosleep \
>>   	       srand_deterministic \
>> +               getrandom arc4random \
>>   	       setutxent getutxent endutxent getutent)
>>   AC_FUNC_STRCOLL
>>   
> Bottom line:
>
> - Interfacing to kernel random number APIs seems reasonable enough.
>
> - The API to shell scripts feels unintuitive.
>
> - As the patch stands, it is not ready to be merged (even assuming
>    arguendo the API is perfect as-is).
>
> Thanks,
>
> Daniel
>

next prev parent reply	other threads:[~2022-11-24  2:59 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-02 17:13 [PATCH] zsh/random module Clinton Bunch
2022-11-03 17:50 ` Bart Schaefer
2022-11-04  3:17 ` dana
2022-11-04  6:22   ` Clinton Bunch
2022-11-04  7:27     ` dana
2022-11-04 12:57       ` Clinton Bunch
2022-11-08  0:18         ` [PATCH] zsh/random module [UPDATED] Clinton Bunch
2022-11-18 14:30           ` Clinton Bunch
2022-11-19  6:42             ` Lawrence Velázquez
2022-11-18 16:23           ` Stephane Chazelas
2022-11-18 17:08             ` Clinton Bunch
2022-11-18 18:12               ` Stephane Chazelas
2022-11-18 18:38                 ` Clinton Bunch
2022-11-23 19:52                   ` Daniel Shahaf
2022-11-24 16:19                     ` Stephane Chazelas
2022-11-24 16:30                       ` Roman Perepelitsa
2022-11-24 22:39                         ` Clinton Bunch
2022-11-25  8:53                           ` Stephane Chazelas
2022-11-25  9:40                             ` Stephane Chazelas
2022-11-28 16:37                               ` further discussion of zsh/random (was [PATCH] zsh/random module [UPDATED]) Clinton Bunch
2022-11-21  1:07           ` [PATCH] zsh/random module [UPDATED] Matthew Martin
2022-11-21  1:59             ` Clinton Bunch
2022-11-21  2:21               ` Matthew Martin
2022-11-21  2:57                 ` Clinton Bunch
2022-11-21  3:14                   ` Lawrence Velázquez
2022-11-21  4:17                     ` Bart Schaefer
2022-11-21  5:05                       ` Clinton Bunch
2022-11-22 13:42                         ` dana
2022-11-23 19:49                         ` Daniel Shahaf
2022-11-22 17:44                       ` Oliver Kiddle
2022-11-22 19:48                         ` Clinton Bunch
2022-11-23  1:23                   ` Matthew Martin
2022-11-23  2:58                     ` Clinton Bunch
2022-11-23  4:14                       ` Matthew Martin
2022-11-23 13:41                         ` Clinton Bunch
2022-11-23 20:33                           ` Daniel Shahaf
2022-11-23 21:42                             ` dana
2022-11-23 23:54                               ` Daniel Shahaf
2022-11-24  0:17                                 ` Daniel Shahaf
2022-11-24  1:05                                 ` dana
2022-11-24 13:52                               ` Clinton Bunch
2022-11-23 19:46           ` Daniel Shahaf
2022-11-24  2:58             ` Clinton Bunch [this message]
2022-11-24 10:07               ` nimaje+zml
2022-11-24 13:19                 ` Clinton Bunch
2022-11-24 14:33             ` Clinton Bunch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e500f9c-48ef-e1eb-ed7c-5895bd5473ab@zentaur.org \
    --to=cdb_zsh@zentaur.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).