zsh-workers
 help / color / mirror / code / Atom feed
* Closing descriptors in forked (pipe) command
@ 2017-02-18 12:45 Sebastian Gniazdowski
  2017-02-19  5:02 ` Bart Schaefer
  0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-18 12:45 UTC (permalink / raw)
  To: zsh-workers

Hello,
in db/gdbm module, I do:
        addmodulefd(gdbm_fdesc(dbf), FDT_MODULE);

this should lead to problem, but it behaves the same as with
FDT_INTERNAL, i.e. following:

% ztie -d db/gdbm -f ~/db3 mybase2
% fun() { while read line; do echo $line; done }
% eval "print -rl -- \${(kv)mybase2[@]}" | fun
% echo $mybase2

The point is: forked process inherits database connection. GDBM cannot
have multiple writers (open is in write mode). Above outputs:

(Dbase accessed)
testkey
testdata

(Dbase accessed)
testkey
testdata

"(Dbase accessed)" is printf from code where $mybase2 turns out to be
empty for given key and database fetch is ordered. So it's visible that
there is fork because database is accessed twice – $mybase2 is still
empty at line "echo $mybase2". Good that GDBM handles change to
file-descriptor's file position pointer (AFAIK it's shared between FDs),
and reads the data again correctly. With write it also works, but
potentially the database might get corrupted silently:

% ztie -d db/gdbm -f ~/db3 mybase2
% fun() { while read line; do echo $line; done }
% eval "mybase2[漢]=字" | fun
(Dbase accessed)
% echo $mybase2[漢]
(Dbase accessed)
字

Maybe it would be good to solve this? FDT_FORKCLOSE or something, there
is closeallelse(), for multiio, didn't go deeper in this.

-- 
  Sebastian Gniazdowski
  psprint2@fastmail.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Closing descriptors in forked (pipe) command
  2017-02-18 12:45 Closing descriptors in forked (pipe) command Sebastian Gniazdowski
@ 2017-02-19  5:02 ` Bart Schaefer
  2017-02-19  8:04   ` Sebastian Gniazdowski
  2017-02-19 10:51   ` [PATCH] " Sebastian Gniazdowski
  0 siblings, 2 replies; 4+ messages in thread
From: Bart Schaefer @ 2017-02-19  5:02 UTC (permalink / raw)
  To: Sebastian Gniazdowski, zsh-workers

On Feb 18,  4:45am, Sebastian Gniazdowski wrote:
}
} in db/gdbm module, I do:
}         addmodulefd(gdbm_fdesc(dbf), FDT_MODULE);
} 
} The point is: forked process inherits database connection. GDBM cannot
} have multiple writers (open is in write mode).

That's not true -- "man gdbm" even mentions:

   If  the  database  was  opened with the GDBM_NOLOCK flag, the user may
   wish to perform their own file locking on the database file  in  order
   to prevent multiple writers operating on the same file simultaneously.

} Maybe it would be good to solve this? FDT_FORKCLOSE or something, there
} is closeallelse(), for multiio, didn't go deeper in this.

In the general case a subshell should be inheriting all descriptors from
the parent, and it's up to the shell programmer not to do stupid things.
Consider the case where the parent shell wishes to ztie to the database,
fork off a subshell that will make use of the tied parameter, and then
go on to exec something else or otherwise exit, leaving the subshell to
manage the database.

You/we certainly may wish to document that it is dangerous to assume
that parent and subshell can both manipulate the tied parameter, but
it would be incorrect to automatically close the file descriptor in
either of them.

This is actually part of the reason why bin_ztie() uses GDBM_SYNC
when opening the database file.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Closing descriptors in forked (pipe) command
  2017-02-19  5:02 ` Bart Schaefer
@ 2017-02-19  8:04   ` Sebastian Gniazdowski
  2017-02-19 10:51   ` [PATCH] " Sebastian Gniazdowski
  1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-19  8:04 UTC (permalink / raw)
  To: Bart Schaefer, zsh-workers

On Sat, Feb 18, 2017, at 09:02 PM, Bart Schaefer wrote:
> On Feb 18,  4:45am, Sebastian Gniazdowski wrote:
> }
> } in db/gdbm module, I do:
> }         addmodulefd(gdbm_fdesc(dbf), FDT_MODULE);
> } 
> } The point is: forked process inherits database connection. GDBM cannot
> } have multiple writers (open is in write mode).
> 
> That's not true -- "man gdbm" even mentions:
> 
>    If  the  database  was  opened with the GDBM_NOLOCK flag, the user may
>    wish to perform their own file locking on the database file  in  order
>    to prevent multiple writers operating on the same file simultaneously.

This would make GDBM quite standout, except I probably superficially
interpreted what I was reading. My opinion about GDBM comes from the
fact that I cannot have reader + writer with gdbmtool, i.e. run once
"gdbmtool -r db", then "gdbmtool db" – first program will stop working:

stdin:1.5: cannot open db: Can't be reader

The superficial reading was one other DBM database that was stating
boldly about no simultaneous access, but I think it was also stating
that multiple threads are possible. Sorry cannot find that again. But
this would reveal that it can be about proper locking, not simultaneous
access itself. Either way GDBM has nice potential but I might not be
able to leverage it in the module. Fast ztie, hash scan, untie is quite
much as custom locking & unlocking to access database.

> } Maybe it would be good to solve this? FDT_FORKCLOSE or something, there
> } is closeallelse(), for multiio, didn't go deeper in this.
> 
> In the general case a subshell should be inheriting all descriptors from
> the parent, and it's up to the shell programmer not to do stupid things.
> Consider the case where the parent shell wishes to ztie to the database,
> fork off a subshell that will make use of the tied parameter, and then
> go on to exec something else or otherwise exit, leaving the subshell to
> manage the database.

Ok, with file position pointer sanity, and "file has changed" sanity of
GDBM, this is in quite good state and the use case you wrote is
available to the user.

-- 
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Closing descriptors in forked (pipe) command
  2017-02-19  5:02 ` Bart Schaefer
  2017-02-19  8:04   ` Sebastian Gniazdowski
@ 2017-02-19 10:51   ` Sebastian Gniazdowski
  1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Gniazdowski @ 2017-02-19 10:51 UTC (permalink / raw)
  To: Bart Schaefer, zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]

On Sat, Feb 18, 2017, at 09:02 PM, Bart Schaefer wrote:
> You/we certainly may wish to document that it is dangerous to assume
> that parent and subshell can both manipulate the tied parameter, but
> it would be incorrect to automatically close the file descriptor in
> either of them.

I've updated docs and added "zgdbmclear" builtin as we now know GDBM is
fine with concurrent access if only it can be synchronized. In Zsh this
is possible via zsystem flock. I think PM_UPTODATE is a great feature as
no other program can change the database. With zsystem flock and
zgdbmclear, one can write a concurrent, elastic program within Zsh (a
challenge, though).

Also added a test case, but tests haven't been commited:

 ztie -d db/gdbm -f $dbfile dbase
 dbase[testkey]=value1
 fun() { while read line; do echo $line; done }
 eval "dbase[testkey]=value2" | fun
 echo $dbase[testkey]
 zgdbmclear dbase testkey
 echo $dbase[testkey]
0:Test store in forked Zsh
>value1
>value2

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: db_gdbm_update2.diff --]
[-- Type: text/plain, Size: 2367 bytes --]

diff --git a/Doc/Zsh/mod_db_gdbm.yo b/Doc/Zsh/mod_db_gdbm.yo
index 699e9ab..7063690 100644
--- a/Doc/Zsh/mod_db_gdbm.yo
+++ b/Doc/Zsh/mod_db_gdbm.yo
@@ -49,6 +49,15 @@ item(tt(zgdbmpath) var(parametername))(
 Put path to database file assigned to var(parametername) into tt(REPLY)
 scalar.
 )
+findex(zgdbmclear)
+cindex(database concurrent access)
+item(tt(zgdbmclear) var(parametername) var(keyname))(
+The tied database is enabled for concurrent access only within single Zsh
+instance. User must use tt(zsystem flock) to guard that access if it involves
+any writes. To refetch a key from database when its change in concurrent
+Zsh process is possible, user should use tt(zgdbmclear) passing name of
+database and name of the key to refetch.
+)
 findex(zgdbm_tied)
 cindex(database tied arrays, enumerating)
 item(tt(zgdbm_tied))(
diff --git a/Src/Modules/db_gdbm.c b/Src/Modules/db_gdbm.c
index 596a8ae..730abea 100644
--- a/Src/Modules/db_gdbm.c
+++ b/Src/Modules/db_gdbm.c
@@ -89,6 +89,7 @@ static struct builtin bintab[] = {
     BUILTIN("ztie", 0, bin_ztie, 1, -1, 0, "d:f:r", NULL),
     BUILTIN("zuntie", 0, bin_zuntie, 1, -1, 0, "u", NULL),
     BUILTIN("zgdbmpath", 0, bin_zgdbmpath, 1, -1, 0, "", NULL),
+    BUILTIN("zgdbmclear", 0, bin_zgdbmclear, 2, -1, 0, "", NULL),
 };
 
 #define ROARRPARAMDEF(name, var) \
@@ -260,6 +261,42 @@ bin_zgdbmpath(char *nam, char **args, Options ops, UNUSED(int func))
     return 0;
 }
 
+/**/
+static int
+bin_zgdbmclear(char *nam, char **args, Options ops, UNUSED(int func))
+{
+    Param pm;
+    char *pmname, *key;
+
+    pmname = *args++;
+    key = *args;
+
+    if (!pmname) {
+        zwarnnam(nam, "parameter name (whose path is to be written to $REPLY) is required");
+        return 1;
+    }
+
+    pm = (Param) paramtab->getnode(paramtab, pmname);
+    if(!pm) {
+        zwarnnam(nam, "no such parameter: %s", pmname);
+        return 1;
+    }
+
+    if (pm->gsu.h != &gdbm_hash_gsu) {
+        zwarnnam(nam, "not a tied gdbm parameter: %s", pmname);
+        return 1;
+    }
+
+    HashTable ht = pm->u.hash;
+    HashNode hn = gethashnode2( ht, key );
+    Param val_pm = (Param) hn;
+    if (val_pm) {
+        val_pm->node.flags &= ~(PM_UPTODATE);
+    }
+
+    return 0;
+}
+
 /*
  * The param is actual param in hash – always, because
  * getgdbmnode creates every new key seen. However, it

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-19 10:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-18 12:45 Closing descriptors in forked (pipe) command Sebastian Gniazdowski
2017-02-19  5:02 ` Bart Schaefer
2017-02-19  8:04   ` Sebastian Gniazdowski
2017-02-19 10:51   ` [PATCH] " Sebastian Gniazdowski

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).