From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Message-id: <A3AADD7F-E09D-49F9-8A5B-3D6B720046A4@mac.com>
From: dave.l@mac.com
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
In-reply-to: <f631016df731e553421e6079dd1da0d4@quintile.net>
Date: Fri, 30 Oct 2009 15:29:29 +0000
References: <f631016df731e553421e6079dd1da0d4@quintile.net>
Subject: Re: [9fans] sed question (OT)
Topicbox-Message-UUID: 947073d6-ead5-11e9-9d60-3106f5b1d025

You can do it, definitely.

Caveat: I'm in bed with a virus and the brain's on impulse power
so these are untested and may be highly suboptimal.

Is the input guaranteed to have 2 words on each line?
What are your definitions of words and blanks?

I know from your snippet that there's no leading blanks and no empty
lines.

Assuming there are 2 words on every line, something like:
h
s/[A-Za-z0-9_-]+(.).*/\1/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/(.)\n([A-Za-z0-9_-]+).(.*)/\2\1\3/

ought to roughly work after your fragment.

If >= 2 words per line isn't assumed:
h
t urnofflag
: urnofflag
s/[A-Za-z0-9_-]+[^ A-Za-z0-9_-]*(.).*/\1/
t for2
b cosnot2wds
: for2
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
G
s/(.)\n([A-Za-z0-9_-]+[^ A-Za-z0-9_-]*).(.*)/\2\1\3/
b
: cosnot2wds
g

Bizarrely, within it's limitations (\n, \0, size limits), sed is, in
some sense, complete,
since you can store any number of things in the spaces (using  /(.*
\n)/ etc.) and branch conditionally.

Another insane possibility, since there are only 26 variations, is to
do:
	s/^a/A/
	s/^([A-Z][A-Za-z0-9]+[^ A-Za-z0-9_-]*)a/\1A/
	s/^b/B/
	s/^([A-Z][A-Za-z0-9]+[^ A-Za-z0-9_-]*)b/\1B/

You can of course, use sed to create the above script like so:
	echo abcdefghijklmnopqrstuvwxyz | sed ...
Filling in the ellipses is left as an exercise for the already addled
reader.

BTW: if you're shovelling a lot of this kind of muck,
it may, paradoxically, be easier to do it on the command line and use
your shell's variables for the repeated bits of regexps, commands etc.
The only caveats are that this technique will curdle your brain even
more than sed already does
and it may, oddly, be the exception to the rule that rc is more
elegant than sh, due to caret vs. double-quotes.

Apologies for grandstanding, but I used to do this sort of stuff for a
living.
I wrote a piece of training courseware for sed once which had far
worse excesses than the above as examples.
RFC-822 header-reassembly anyone?

I also used to get my intellectual rocks off on stuff like this until
I finally grew up (in my late 40s).

Dave.

SEE ALSO
	teco, assembler, qed.


On 29 Oct 2009, at 15:41, Steve Simon wrote:

> Sorry, not really the place for such questions but...
>
> I always struggle with sed, awk is easy but sed makes my head hurt.
>
> I am trying to capitalise the first tow words on each line (I could
> use awk
> as well but I have to use sed so it seems churlish to start another
> process).
>
> capitalising the first word on the line is easy enough:
>
> 			h
> 			s/^(.).*/\1/
> 			y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
> 			x
> 			s/^.(.*)/\1/
> 			x
> 			G
> 			s/\n//
>
> Though there maye be a much easier/more elegant way to do this,
> but for the 2nd word it gets much harder.
>
> What I really want is sam's ability to select a letter and operate
> on it
> rather than everything being line based as sed seems to be.
>
> any neat solutions? (extra points awarded for use of the branch
> operator :-)
>
> -Steve
>