Shell argument splitting behaviour

zsh-workers
 help / color / mirror / code / Atom feed

* Shell argument splitting behaviour
@ 2008-10-01 13:02 Peter Stephenson
  2008-10-03 13:56 ` Peter Stephenson
  0 siblings, 1 reply; 2+ messages in thread
From: Peter Stephenson @ 2008-10-01 13:02 UTC (permalink / raw)
  To: Zsh hackers list

I disovered this inconvenience in the parameter splitting flag (z) which
splits words in a similar way to how command line arguments are handled.

  foo="(one) (two) (three)"
  print -l ${(z)foo}

prints

  (
  one
  )
  (two)
  (three)

That's because the command word in the line is treated differently; in
this case, it looks like the start of a subshell.  I wasn't expecting it
when splitting a string, because it's just an arbitrary set of words,
and my first reaction was to change it (which is easy enough) but I
suppose you can think of it as a feature.  The same feature occurs when
the line editor splits arguments: in insert-last-word and
copy-prev-shell-word.  In those cases the current behaviour is right,
although it'll only rarely make a difference.

I thought I'd mention it in case anyone else had any reactions.

What I was trying to do was use this to get lisp-like lists of
arguments (since after the first word parentheses have to be balanced),
but I can get that to work just by putting a dummy word in front, so
it's actually not a major concern.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Shell argument splitting behaviour
  2008-10-01 13:02 Shell argument splitting behaviour Peter Stephenson
@ 2008-10-03 13:56 ` Peter Stephenson
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Stephenson @ 2008-10-03 13:56 UTC (permalink / raw)
  Cc: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 906 bytes --]

On Wed, 01 Oct 2008 14:02:02 +0100
Peter Stephenson <pws@csr.com> wrote:
> What I was trying to do was use this to get lisp-like lists of
> arguments (since after the first word parentheses have to be balanced),
> but I can get that to work just by putting a dummy word in front, so
> it's actually not a major concern.

In case there's any interest, here's what I came up with for my own use.
The list-word function handle list-style trees showing what can follow what.
It's essentially yet another way of doing argument handling, optimised for
another different case.  The _dynamic_directory_name for my own use shows
roughly how to use this; it completes colon-separated parts as in
~[p1:u:main].

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070

[-- Attachment #2: list_word --]
[-- Type: application/octet-stream, Size: 7970 bytes --]

#autoload
#
# This function can be used to retrieve the list of words that are allowed
# at each point in an ordered list of words where the next word depends on
# the previous one, given a lisp-style input and the words so far.  For
# example, if the first argument may be "one" "two" or "three", but "one"
# is always followed by the second argument "eins", and likewise "two" by
# "zwei" and "three" by "drei", this function will list the possibilities
# at each step.  (See the second and third of the "Examples" below for how
# this particular example works.)
#
# Input:
#   - a lisp-like tree structure as a single string.  The outermost
#     list, which includes the entire tree, contains the specification of a
#     set of words.  The specification for each word is also a list that
#     may consist of one or two elements.  The first element is the word
#     itself; the optional second element is a recursive list in the same
#     format as the outermost list that specifies words that may follow.
#
#   - The main tree structure may optionally be preceded by a number of named
#     trees, in the form "name-:tree", where tree has the identical form
#     to the main tree (so is surrounded by parentheses).  These
#     may appear whenever a subtree, i.e. list describing following words,
#     may appear in the code, for example the single argument
#       ((one ((eins))) (two ((zwei) (dos) (duo))))
#     and the two arguments
#       twoarg-:((zwei) (dos) (duo))
#       ((one ((eins))) (two -:twoarg))
#     are equivalent.  Recursive use of named subtrees is possible, so
#       A-:((A -:B))
#       B-:((B -:A))
#       ((A -:B))
#     describes any number of alternating words A B A ...
#
#   - words matched so far as separate arguments.
#
# Output:
#   sets reply to the list of possible values that can come next and return
#   zero.  If no match, or two many arguments, return 1.
#
#   Returns status 2 if
#   - an argument before the first surrounded by parentheses did not
#     fit the form *'-:('*')'
#   - no argument matched the form '('*')'
#   - a list describing a single word and (optionally) its following word
#     had more than two elements
#   - if such a list had the form (word -:name), no predefined sublist
#     for name existed.
#
# Further notes on list format:
#   Ordinary shell quoting may be applied to individual elements and will
#   be stripped for comparisons and in the returned array.  No shell
#   expansion is performed.  If in doubt, characters should be quoted
#   since the list is parsed by shell word splitting in which certain
#   characters (such as "<" and ">") are processed as separate words
#   when unquoted.  Quoting also escapes other active forms, including
#   "-:" described below.  Note that the quotes here are additional
#   to any quotes need to protect the argument to the function from
#   immediate shell expansion.
#
#   Words matched so far (i.e. arguments to the function after the
#   top-level list) are not subject to further quote processing.
#
#   The named trees may be used as part of lists of words (as well
#   as in the second element of word specifications).  In this case,
#   one level of parentheses will be removed and the result used
#   as if it were a list of word specifications.  Hence the arguments:
#     extra-:((more1) (more2) (more3))
#     ((some1) (some2) (some3) -:extra)
#   behave as if all six words were given as top-level possibilities.
#   As an example to distinguish the two uses, the following:
#     extra-:((more1) (more2) (more3))
#     ((some1 -:extra) (some2) (some3) -:extra)
#   has the same effect at the top level, but the first word "some1"
#   can also be followed by the three words defined by -:extra.
#
# Examples:
#   The simplest case:
#     "((one) (two) (three))"
#   sets reply to the three elements one, two three.
#
#     "((one ((eins))) (two ((zwei))) (three ((drei))))"
#   the same
#
#     "((one ((eins))) (two ((zwei))) (three ((drei))))" two
#   sets reply to the single element zwei
#
#     "((one ((eins))) (two ((zwei) (dos) (duo))) (three ((drei))))" two
#   sets reply to the array consisting of zwei, dos, duo.
#
# Notes:
#   Note that parentheses have two different tasks, hence the proliferating
#   levels.  The outermost parentheses and alternate levels going inward
#   enclose lists of possible values at a particular depth, and there can
#   be as many elements as necessary within each level.  The
#   second-from-outermost parentheses and alternate levels describe
#   a single argument at the current level, together with an optional
#   specification for those that may follow.
#
#   The innermost level(s) of parentheses around a single argument
#   may be missed out; however, this makes it more confusing when
#   attempting to add new levels.
#
#   Unquoted parentheses and whitespace are always significant; use quotes
#   where necessary.
#
#   The code is not tolerant to errors in parentheses.  Use named
#   subtrees to clarify structure.

list_word_expand_tree() {
  # work around the fact that "(" is a keyword if it appears first
  local param=$1 tmptree=": $2"
  set -A $param ${(z)tmptree}
  shift $param
}

list_word() {
  emulate -L zsh
  setopt extendedglob

  local -a match mbegin mend
  typeset -A sublists
  while [[ $1 = (#b)(*)'-:'(\(*\)) ]]; do
    sublists[$match[1]]=$match[2]
    shift
  done

  [[ $# -gt 0 && $1 = \(*\) ]] || return 2

  local tree=$1 elt nexttree
  local -a atree subtree substs
  local -A seen
  shift

  while [[ $# -gt 0 && ${#tree} -ne 0 ]]; do
    if [[ $tree = \(*\) ]]; then
      # at this level this must be a single argument
      tree=$tree[2,-2]
    fi
    list_word_expand_tree atree $tree
    # loop over additional substitutions
    # marking ones we've done in seen
    seen=()
    while true; do
      for elt in $atree; do
	if [[ $elt = \(*\) ]]; then
	  elt=$elt[2,-2]
	  list_word_expand_tree subtree $elt
	  elt=${(Q)subtree[1]}
	  if (( ${#subtree} > 2 )); then
	    return 2
	  fi
	  if [[ $subtree[2] = "-:"* ]]
	    then
	    nexttree=$sublists[${subtree[2][3,-1]}]
	    [[ -z $nexttree ]] && return 2
	  else
	    nexttree=$subtree[2]
	  fi
	elif [[ $elt = "-:"* ]]; then
	  elt=${(Q)elt[3,-1]}
	  if [[ -z ${seen[$elt]} ]]; then
	    [[ -z $sublists[$elt] ]] && return 2
	    seen[$elt]=1
	    substs+=($elt)
	    continue
	  fi
	else
	  elt=${(Q)elt}
	  nexttree=
	fi
	if [[ $elt = $1 ]]; then
	  # matched at this level, dive deeper to the next level
	  shift
	  tree=$nexttree
	  continue 3
	fi
      done
      # Process additional -:stuff we may have picked up.
      (( ${#substs} )) || break
      atree=()
      for elt in $substs; do
	# Strip one level of parentheses.
	elt=${${sublists[$elt]}[2,-2]}
	# Add this to the current level for further processing.
	list_word_expand_tree subtree $elt
	atree+=($subtree)
      done
      substs=()
    done
    return 1
  done

  if [[ $# -eq 0 && ${#tree} -ne 0 ]]; then
    if [[ $tree = \(*\) ]]; then
      # at this level this must be a single argument
      tree=$tree[2,-2]
    fi
    list_word_expand_tree atree $tree
    typeset -ga reply
    reply=()
    seen=()
    while true; do
      for elt in $atree; do
	if [[ $elt = \(*\) ]]; then
	  elt=$elt[2,-2]
	  list_word_expand_tree subtree $elt
	  reply+=(${(Q)subtree[1]})
	elif [[ $elt = "-:"* ]]; then
	  elt=${(Q)elt[3,-1]}
	  if [[ -z ${seen[$elt]} ]]; then
	    [[ -z $sublists[$elt] ]] && return 2
	    seen[$elt]=1
	    substs+=($elt)
	    continue
	  fi
	else
	  reply+=(${(Q)elt})
	fi
      done
      # Process additional -:stuff we may have picked up.
      (( ${#substs} )) || break
      atree=()
      for elt in $substs; do
	# Strip one level of parentheses.
	elt=${${sublists[$elt]}[2,-2]}
	# Add this to the current level for further processing.
	list_word_expand_tree subtree $elt
	atree+=($subtree)
      done
      substs=()
    done
    return 0
  else
    return 1
  fi
}

list_word "$@"

[-- Attachment #3: _dynamic_directory_name --]
[-- Type: application/octet-stream, Size: 680 bytes --]

#autoload

local expl SEPCHAR
local -a dirs parts reply

# Configurable bit
SEPCHAR=:
dirs=(
  "uwb-:((main) (p=buffer32))"
  "dot11-:((main) (v4.0) (v5.0))"
  "proj-:((u -:uwb) (11 -:dot11))"
  "((p1 -:proj) -:proj)"
)
# End config

local -a parts

if [[ $PREFIX = *${SEPCHAR}[^${SEPCHAR}]# ]]; then
  if [[ $SEPCHAR = . ]]; then
    eval parts=\(\"\${\(@s:${SEPCHAR}:\)PREFIX}\"\)
  else
    eval parts=\(\"\${\(@s.${SEPCHAR}.\)PREFIX}\"\)
  fi
  parts=("${(@)parts[1,-2]}")
  compset -P "*$SEPCHAR"
# else leave parts empty and PREFIX as whatever
fi

autoload -Uz list_word
list_word $dirs "${(@)parts}"

_wanted namepart expl "Name part" compadd -S']' -r "$SEPCHAR" -- $reply

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-10-03 13:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-01 13:02 Shell argument splitting behaviour Peter Stephenson
2008-10-03 13:56 ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).