[9fans] Strange boot behaviour

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] Strange boot behaviour
@ 2004-02-12 13:48 Lucio De Re
  2004-02-12 14:35 ` David Presotto
  0 siblings, 1 reply; 24+ messages in thread
From: Lucio De Re @ 2004-02-12 13:48 UTC (permalink / raw)
  To: 9fans mailing list

Given:

sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
sd53c8xx: bios scntl3(70) stest2(00)
sd53c8xx: bios scntl3(70) stest2(00)
ether#0: elnk3: port 0x300 irq 11: 00A0244026C9
Unknown boot device: sd00!9fat!9pcf
Boot devices: fd0 ether0
boot from: ether0!/386/9pcf.gz
tickle (192.96.32.69!67): /386/9pcf.gz
gz...874346 => 867828+1146820+108996=2123644
entry: 80100020
Plan 9
apicbase 0xFEE00100
cpu0: 501MHz GenuineIntel Celeron (cpuid: AX 0x0665 DX 0x183F9FF)
ELCR: 1420
#l0: elnk3: 10Mbps port 0x300 irq 11: 00A0244026C9
sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
#U/usb0: uhci: port 0xE400 irq 10
19287 free pages, 77148K bytes, 309148K swap
root is from (tcp, il, local)[local!#S/sd00/fossil]:
user[none]: glenda
sd53c8xx: bios scntl3(00) stest2(00)
boot: can't connect to file server: '#S/sd00/fossil' does not exist
panic: boot process died: unknown
pdumpstack
anic: boot process died: unknown
ktrace /kernel/path 801serialoq 1758 printed 1770
06978 8000a63c
estackx 8000a8f0
8000a5dc=801067ab 8000a5e4=801ae986 8000a610=8013a6ca 8000a624=80106978
8000a638=80106978 8000a63c=801067af 8000a644=8013a94c 8000a690=801c8960
8000a694=801c89c9 8000a6a4=801c8d4d 8000a6ac=801c8c17 8000a6b8=801c8960
8000a6c4=801c99b1 8000a6d4=801c8b92 8000a6fc=801ca19a 8000a708=8019f282
8000a710=8019f282 8000a718=801b760a 8000a724=801ca50c 8000a730=801aee18
8000a738=801aee64 8000a73c=8019f557 8000a74c=801a9000 8000a758=801a1a79
8000a76c=801abc67 8000a790=801c8b92 8000a7b8=801ca19a 8000a7c8=801b66ca
8000a7d4=801b760a 8000a7d8=801b6ff2 8000a7e8=801b7136 8000a7f4=801b760a
8000a7f8=80307950 8000a7fc=8023431c 8000a800=801c7357 8000a804=8000a820
8000a808=24cdc88c 8000a80c=00000002 8000a810=00001605 8000a814=00000000
8000a818=80300d58 8000a81c=00000000 8000a820=24cdde91 8000a824=00000002
8000a828=805ee480 8000a82c=801b20e9 8000a830=8010603b 8000a834=802ec71c
8000a838=7fffefe8 8000a83c=801c6529 8000a840=00000007 8000a844=00001605
8000a848=00000000 8000a84c=805ee480 8000a850=00001605 8000a854=00000000
8000a858=00000000 8000a85c=7fffefc8 8000a860=802ea978 8000a864=00000000
8000a868=80106d88 8000a86c=80307274 8000a870=00000023 8000a874=00000200
8000a878=7fffefcc 8000a87c=0000001b 8000a880=801027ff 8000a884=00000008
8000a888=8019eead 8000a88c=00000000 8000a890=ffffffff 8000a894=7fffed5c
8000a898=00000000 8000a89c=80100a1e 8000a8a0=8000a8a4 8000a8a4=7fffed5c
8000a8a8=00000000 8000a8ac=00000000 8000a8b0=8000a8c4 8000a8b4=00000000
8000a8b8=ffffffff 8000a8bc=00000001 8000a8c0=00000008 8000a8c4=0000001b
8000a8c8=0000001b 8000a8cc=0000001b 8000a8d0=0000001b 8000a8d4=00000040
8000a8d8=80100569 8000a8dc=00006d96 8000a8e0=00000023 8000a8e4=00000286
8000a8e8=7fffed5c 8000a8ec=0000001b
cpu0: exiting

There are a few questions.

1.  Why does 9load not pick up #S/sd00 as a valid boot location?  It
only offers fd0 and ether0 as options.  Is the missing #S/
significant?  If so, then the installation is faulty.

2.  Why does the fossil kernel 9pcf.gz not find #S/sd00/fossil?  It
was created by the installation procedure and 9pcdisk.gz is aware of
it.  Hm, a missing "disk"?

Note that I have added the PCI IDs for the SCSI controller card to
sd53c8xx.c and this was OK for the installation process.

Lastly, 9loaddebug differs from 9load only in the absence of a -H3
load option and two hash out commands.  It doesn't work, either,
unless the -H3 is entered, in which case it is not different from
9load.  Is it worth keeping?

Suggestions on what I must do to get over this hurdle?  I will try
fossil loaded manually over 9pcdisk.gz, but I'm not sure what I'll
learn from there.

++L


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-12 13:48 [9fans] Strange boot behaviour Lucio De Re
@ 2004-02-12 14:35 ` David Presotto
  2004-02-12 16:16   ` [9fans] seeking for the truth rog
  2004-02-13  6:15   ` [9fans] Strange boot behaviour Lucio De Re
  0 siblings, 2 replies; 24+ messages in thread
From: David Presotto @ 2004-02-12 14:35 UTC (permalink / raw)
  To: 9fans

On Thu Feb 12 08:50:42 EST 2004, lucio@proxima.alt.za wrote:

> 1.  Why does 9load not pick up #S/sd00 as a valid boot location?  It
> only offers fd0 and ether0 as options.  Is the missing #S/
> significant?  If so, then the installation is faulty.

The #S isn't a problem.  9load doesn't know about # stuff.
Whatever is wrong, this isn't it.

>
> 2.  Why does the fossil kernel 9pcf.gz not find #S/sd00/fossil?  It
> was created by the installation procedure and 9pcdisk.gz is aware of
> it.  Hm, a missing "disk"?

So, if you load a 9pcf and a 9pcdisk both built from the same sources,
one can see #S/sd00/fossil and the other can't?  That should be
impossible.  I have no idea why that would be.

>
> Note that I have added the PCI IDs for the SCSI controller card to
> sd53c8xx.c and this was OK for the installation process.
>

I take it that this:
	sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
is your disk.  If so, both the 9pcf kernel and 9load found it.  They
just didn't find the partitions.  Can you boot 9pcf off of a
network file server and see if you can look at the device at
all?  Maybe its not sd00?  What partitions do you see.

This would be the easiest way to figure it all out.

> Lastly, 9loaddebug differs from 9load only in the absence of a -H3
> load option and two hash out commands.  It doesn't work, either,
> unless the -H3 is entered, in which case it is not different from
> 9load.  Is it worth keeping?
>

It's there so we have something to run acid over that is in a format
acid understands.  Acid doesn't do well with the -H3 format.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [9fans] seeking for the truth
  2004-02-12 14:35 ` David Presotto
@ 2004-02-12 16:16   ` rog
  2004-02-12 16:32     ` David Presotto
                       ` (2 more replies)
  2004-02-13  6:15   ` [9fans] Strange boot behaviour Lucio De Re
  1 sibling, 3 replies; 24+ messages in thread
From: rog @ 2004-02-12 16:16 UTC (permalink / raw)
  To: 9fans

i was caught up short just now when i tried to reverse
some lines in acme by selecting them and executing "|tail -r".

it didn't work.
the reason?

/sys/src/cmd/tail.c:94: 	seekable = seek(file,0L,0) == 0;

the seek() is succeeding, even though standard input
isn't actually seekable.

the trouble is, it can't do anything else.

we don't see this problem much because of the special case hack for
"#|" in sseek(), but the tail code is a common hack, and an insidious
one.  there are loads of special files around that don't allow
seeking, and many programs that require files that are seekable (diff
being the age-old example).

surely a better solution is possible?  in general, a fileserver knows
whether its files are seekable or not, because it determines how to
interpret the seek offset.

why not add a new bit to the file attributes, say DMNOSEEK (and
associated QTNOSEEK)?  if set, it would indicate that file offsets
when reading and writing the file will be ignored.  0x2 seems to be
available.

then the above line from tail.c could work, and finally it would be
possible to write programs that know definitively whether they are
able to seek on a particular file or not, rather than failing
strangely sometime later.

the amount of code involved would be tiny, and as far as i can see
it would be completely backwardly compatible.

thoughts?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 16:16   ` [9fans] seeking for the truth rog
@ 2004-02-12 16:32     ` David Presotto
  2004-02-12 17:10       ` rog
  2004-02-12 16:50     ` Dave Lukes
  2004-02-12 20:19     ` boyd, rounin
  2 siblings, 1 reply; 24+ messages in thread
From: David Presotto @ 2004-02-12 16:32 UTC (permalink / raw)
  To: 9fans

> seeking, and many programs that require files that are seekable (diff
> being the age-old example).

except diff doesn't seek the files it is comparing, only its temps...

> surely a better solution is possible?  in general, a fileserver knows
> whether its files are seekable or not, because it determines how to
> interpret the seek offset.

Fixing all the programs that seek without checking might also be nice.

> why not add a new bit to the file attributes, say DMNOSEEK (and
> associated QTNOSEEK)?  if set, it would indicate that file offsets
> when reading and writing the file will be ignored.  0x2 seems to be
> available.

Maybe, need a few days to think about it.  I'm vaguely disturbed by it
but don't know why.  Perhaps because 9p2000 has no concept of seek and
having a bit that talks about it seems odd.  Then again that's true of the
exec permission bit also...



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 16:16   ` [9fans] seeking for the truth rog
  2004-02-12 16:32     ` David Presotto
@ 2004-02-12 16:50     ` Dave Lukes
  2004-02-12 16:59       ` Rob Pike
  2004-02-12 20:19     ` boyd, rounin
  2 siblings, 1 reply; 24+ messages in thread
From: Dave Lukes @ 2004-02-12 16:50 UTC (permalink / raw)
  To: 9fans

Definite plus vote from me.
Nugatory seeks have annoyed me ever since 6th Ed.

Also,
if anything out there _is_ seek()ing without checking the result,
then either it's broke anyway,
or it knows it can safely ignore the result.
So, off the top of my head, I don't see compatibility as an issue.

Obviously this would need some more looking at,
and a pile of "grep seek ..."s, but it sounds like a plan to me ...

	Dave.

On Thu, 2004-02-12 at 16:16, rog@vitanuova.com wrote:
> i was caught up short just now when i tried to reverse
> some lines in acme by selecting them and executing "|tail -r".
>
> it didn't work.
> the reason?
>
> /sys/src/cmd/tail.c:94: 	seekable = seek(file,0L,0) == 0;
>
> the seek() is succeeding, even though standard input
> isn't actually seekable.
>
> the trouble is, it can't do anything else.
>
> we don't see this problem much because of the special case hack for
> "#|" in sseek(), but the tail code is a common hack, and an insidious
> one.  there are loads of special files around that don't allow
> seeking, and many programs that require files that are seekable (diff
> being the age-old example).
>
> surely a better solution is possible?  in general, a fileserver knows
> whether its files are seekable or not, because it determines how to
> interpret the seek offset.
>
> why not add a new bit to the file attributes, say DMNOSEEK (and
> associated QTNOSEEK)?  if set, it would indicate that file offsets
> when reading and writing the file will be ignored.  0x2 seems to be
> available.
>
> then the above line from tail.c could work, and finally it would be
> possible to write programs that know definitively whether they are
> able to seek on a particular file or not, rather than failing
> strangely sometime later.
>
> the amount of code involved would be tiny, and as far as i can see
> it would be completely backwardly compatible.
>
> thoughts?
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 16:50     ` Dave Lukes
@ 2004-02-12 16:59       ` Rob Pike
  2004-02-12 17:03         ` Fco.J.Ballesteros
  0 siblings, 1 reply; 24+ messages in thread
From: Rob Pike @ 2004-02-12 16:59 UTC (permalink / raw)
  To: 9fans

dave's right. there's no seek concept in 9P, only offsets.

i can't get excited about this.  you used the word 'hack'
yourself to describe the result.

here's a fix:

	|cat|tail -r

-rob



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 16:59       ` Rob Pike
@ 2004-02-12 17:03         ` Fco.J.Ballesteros
  2004-02-12 17:17           ` Charles Forsyth
  0 siblings, 1 reply; 24+ messages in thread
From: Fco.J.Ballesteros @ 2004-02-12 17:03 UTC (permalink / raw)
  To: 9fans

> here's a fix:
>
> 	|cat|tail -r

But this shows a non-uniform behaviour.
I'd expect
	|tail -r
to be exactly like
	|cat|tail -r
but for the buffering involved.

I'm not sure about the real fix. My first attempt would
be to ban out seek; although I know that's not feasible in Plan 9...



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 16:32     ` David Presotto
@ 2004-02-12 17:10       ` rog
  0 siblings, 0 replies; 24+ messages in thread
From: rog @ 2004-02-12 17:10 UTC (permalink / raw)
  To: 9fans

presotto:
> except diff doesn't seek the files it is comparing, only its temps...

actually, diff only makes temp files if it thinks it has to,
which is why diff <{gunzip<x.gz} <{gunzip<y.gz} didn't work
(actually i think the checks are a bit more rigorous now).

> Maybe, need a few days to think about it.  I'm vaguely disturbed by it
> but don't know why.  Perhaps because 9p2000 has no concept of seek and
> having a bit that talks about it seems odd.  Then again that's true of the
> exec permission bit also...

could call it "QTIGNORESOFFSETS" but i couldn't think of anything
in that line that was snappy enough.

rob:
> i can't get excited about this.  you used the word 'hack'
> yourself to describe the result.

it's a hack now. it wouldn't be if there was some system support
for it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 17:03         ` Fco.J.Ballesteros
@ 2004-02-12 17:17           ` Charles Forsyth
  2004-02-12 17:22             ` Charles Forsyth
                               ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Charles Forsyth @ 2004-02-12 17:17 UTC (permalink / raw)
  To: 9fans

i'm not enthusiastic about having to change nearly every file server.
status files will be seekable but most synthetic ones won't.
some synthetic files will never be seekable in tail's sense because
they have no end, but they do interpret offsets, so are they QTSEEK or not?

isn't tail's seek an optimisation to avoid reading the file to the end?
unseekable synthetic files typically are length 0 (because they have
no particular length), or tiny (pipes have a hack that stat shows what's in them)
so change the test to declare seekable only if the file is big enough
to warrant it.  one hack deserves another.  then we can get back
to trying to get a version of ghostscript that doesn't blow up on
recent pdf/ps.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 17:17           ` Charles Forsyth
@ 2004-02-12 17:22             ` Charles Forsyth
  2004-02-12 18:00               ` rob pike, esq.
  2004-02-12 17:26             ` Fco.J.Ballesteros
  2004-02-12 17:35             ` Dave Lukes
  2 siblings, 1 reply; 24+ messages in thread
From: Charles Forsyth @ 2004-02-12 17:22 UTC (permalink / raw)
  To: 9fans

>>so change the test to declare seekable only if the file is big enough
>>to warrant it.  one hack deserves another.  then we can get back

simpler might be: seekable = seek(fd, 0, 2) > 0;



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 17:17           ` Charles Forsyth
  2004-02-12 17:22             ` Charles Forsyth
@ 2004-02-12 17:26             ` Fco.J.Ballesteros
  2004-02-12 17:35             ` Dave Lukes
  2 siblings, 0 replies; 24+ messages in thread
From: Fco.J.Ballesteros @ 2004-02-12 17:26 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 275 bytes --]

I'd say this is a fundamental problem.
seek has sense for files that are always there.
But, as you said, many of your files are made
on demand, and the seek concept does not
apply well for them.

Isn't just enough to try to keep the seeking programs
to a bare minimum?

[-- Attachment #2: Type: message/rfc822, Size: 2423 bytes --]

From: Charles Forsyth <forsyth@terzarima.net>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] seeking for the truth
Date: Thu, 12 Feb 2004 17:17:59 0000
Message-ID: <c2f71f0dc611079cbba4d0909fc1afff@terzarima.net>

i'm not enthusiastic about having to change nearly every file server.
status files will be seekable but most synthetic ones won't.
some synthetic files will never be seekable in tail's sense because
they have no end, but they do interpret offsets, so are they QTSEEK or not?

isn't tail's seek an optimisation to avoid reading the file to the end?
unseekable synthetic files typically are length 0 (because they have
no particular length), or tiny (pipes have a hack that stat shows what's in them)
so change the test to declare seekable only if the file is big enough
to warrant it.  one hack deserves another.  then we can get back
to trying to get a version of ghostscript that doesn't blow up on
recent pdf/ps.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 17:17           ` Charles Forsyth
  2004-02-12 17:22             ` Charles Forsyth
  2004-02-12 17:26             ` Fco.J.Ballesteros
@ 2004-02-12 17:35             ` Dave Lukes
  2 siblings, 0 replies; 24+ messages in thread
From: Dave Lukes @ 2004-02-12 17:35 UTC (permalink / raw)
  To: 9fans

> unseekable synthetic files typically are length 0 (because they have
> no particular length), or tiny (pipes have a hack that stat shows what's in them)

The man is, as usual, correct:
I had my mind too far into a un*x model of files/devices/pipes only.
My apologies.

> so change the test to declare seekable only if the file is big enough
> to warrant it.
:-))

>   one hack deserves another.  then we can get back
> to trying to get a version of ghostscript that doesn't blow up on
> recent pdf/ps.

Amen.
	Dave.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 17:22             ` Charles Forsyth
@ 2004-02-12 18:00               ` rob pike, esq.
  2004-02-12 18:05                 ` Charles Forsyth
  2004-02-12 18:10                 ` rog
  0 siblings, 2 replies; 24+ messages in thread
From: rob pike, esq. @ 2004-02-12 18:00 UTC (permalink / raw)
  To: 9fans

> simpler might be: seekable = seek(fd, 0, 2) > 0;

now you're talking.

-rob



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 18:00               ` rob pike, esq.
@ 2004-02-12 18:05                 ` Charles Forsyth
  2004-02-12 18:10                 ` rog
  1 sibling, 0 replies; 24+ messages in thread
From: Charles Forsyth @ 2004-02-12 18:05 UTC (permalink / raw)
  To: 9fans

i ought to have pointed out that in some applications the
pointer needs to be restored after the seek(,2) test; i knew that
at the time, but it made the general case a bit less pretty!
tail does more messing about than i remembered.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 18:10                 ` rog
@ 2004-02-12 18:10                   ` David Presotto
  0 siblings, 0 replies; 24+ messages in thread
From: David Presotto @ 2004-02-12 18:10 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 119 bytes --]

It's true of qio files in general, including pipes.

I've used it often to know how much I can read without blocking.

[-- Attachment #2: Type: message/rfc822, Size: 1977 bytes --]

From: rog@vitanuova.com
To: 9fans@cse.psu.edu
Subject: Re: [9fans] seeking for the truth
Date: Thu, 12 Feb 2004 18:10:18 0000
Message-ID: <3b828656ea8bbad209d8a8a91f883860@vitanuova.com>

> > simpler might be: seekable = seek(fd, 0, 2) > 0;
>
> now you're talking.

pity it doesn't work on /net/tcp/0/data.

but i always thought that using the file size for "amount of unbuffered data"
was a nasty thing to do... does anything actually use that property?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 18:00               ` rob pike, esq.
  2004-02-12 18:05                 ` Charles Forsyth
@ 2004-02-12 18:10                 ` rog
  2004-02-12 18:10                   ` David Presotto
  1 sibling, 1 reply; 24+ messages in thread
From: rog @ 2004-02-12 18:10 UTC (permalink / raw)
  To: 9fans

> > simpler might be: seekable = seek(fd, 0, 2) > 0;
>
> now you're talking.

pity it doesn't work on /net/tcp/0/data.

but i always thought that using the file size for "amount of unbuffered data"
was a nasty thing to do... does anything actually use that property?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] seeking for the truth
  2004-02-12 16:16   ` [9fans] seeking for the truth rog
  2004-02-12 16:32     ` David Presotto
  2004-02-12 16:50     ` Dave Lukes
@ 2004-02-12 20:19     ` boyd, rounin
  2 siblings, 0 replies; 24+ messages in thread
From: boyd, rounin @ 2004-02-12 20:19 UTC (permalink / raw)
  To: 9fans

> thoughts?

seeking on pipes doesn't scale.

adding more junk to support special cases is a bad idea.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-12 14:35 ` David Presotto
  2004-02-12 16:16   ` [9fans] seeking for the truth rog
@ 2004-02-13  6:15   ` Lucio De Re
  2004-02-13 12:58     ` Lucio De Re
  2004-02-14 21:49     ` David Presotto
  1 sibling, 2 replies; 24+ messages in thread
From: Lucio De Re @ 2004-02-13  6:15 UTC (permalink / raw)
  To: 9fans

On Thu, Feb 12, 2004 at 09:35:59AM -0500, David Presotto wrote:
>
> On Thu Feb 12 08:50:42 EST 2004, lucio@proxima.alt.za wrote:
>
> > 1.  Why does 9load not pick up #S/sd00 as a valid boot location?  It
> > only offers fd0 and ether0 as options.  Is the missing #S/
> > significant?  If so, then the installation is faulty.
>
> The #S isn't a problem.  9load doesn't know about # stuff.
> Whatever is wrong, this isn't it.
>
There is no 9pcf in 9fat, but the error message suggests that the
"device" is unknown.  New trace below.
> >
> > 2.  Why does the fossil kernel 9pcf.gz not find #S/sd00/fossil?  It
> > was created by the installation procedure and 9pcdisk.gz is aware of
> > it.  Hm, a missing "disk"?
>
> So, if you load a 9pcf and a 9pcdisk both built from the same sources,
> one can see #S/sd00/fossil and the other can't?  That should be
> impossible.  I have no idea why that would be.
>
It isn't impossible, unless I'm misreading the new trace (appended
below).  It is extremely unlikely that the 9pcdisk and 9pcf kernels
are significantly different because of something I did and I suppose
supplying sources etc won't really help unless I supply the hardware
too :-(
>
> I take it that this:
> 	sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007

and the next one, I'm not sure which one was used.  Maybe there's a
problem with a twin controller?

> is your disk.  If so, both the 9pcf kernel and 9load found it.  They
> just didn't find the partitions.  Can you boot 9pcf off of a
> network file server and see if you can look at the device at
> all?  Maybe its not sd00?  What partitions do you see.
>
9pcf is loaded off the network, it turned out to be the easiest route.
But I think you're asking something else, really.

> This would be the easiest way to figure it all out.
>
> > Lastly, 9loaddebug differs from 9load only in the absence of a -H3
> > load option and two hash out commands.  It doesn't work, either,
> > unless the -H3 is entered, in which case it is not different from
> > 9load.  Is it worth keeping?
> >
>
> It's there so we have something to run acid over that is in a format
> acid understands.  Acid doesn't do well with the -H3 format.

That explains.  I was hoping it was a version with debug enabled, but
that's not applicable.  I'll do some homework.

Here is the new boot trace, using 9pcdisk.gz instead of 9pcf.gz,
annotated:

----- cut here -----
sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
sd53c8xx: bios scntl3(70) stest2(00)
sd53c8xx: bios scntl3(70) stest2(00)
ether#0: elnk3: port 0x300 irq 11: 00A0244026C9
Unknown boot device: sd00!9fat!9pcf
^^^^^^^ I can put 9pcf on 9fat, if it helps
Boot devices: fd0 ether0
boot from: ether0!/386/9pcdisk.gz
           ^^^^^^^^^^^^^^^^^^^^^^ note network boot
tickle (192.96.32.69!67): /386/9pcdisk.gz
gz...729520 => 867796+818960+108996=1795752
entry: 80100020
Plan 9
apicbase 0xFEE00100
cpu0: 501MHz GenuineIntel Celeron (cpuid: AX 0x0665 DX 0x183F9FF)
ELCR: 1420
#l0: elnk3: 10Mbps port 0x300 irq 11: 00A0244026C9
sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
#U/usb0: uhci: port 0xE400 irq 10
19335 free pages, 77340K bytes, 309340K swap
root is from (tcp, il, local)[local!#S/sd00/fossil]: il
^^^^^^^^^^^^                                         ^^ and network FS
user[none]: glenda
sd53c8xx: bios scntl3(00) stest2(00)
sd53c8xx: bios scntl3(00) stest2(00)
version...
!Adding key: dom=proxima.alt.za proto=p9sk1
user[glenda]:
password:
!
time...
init: starting /bin/rc
dossrv: serving #s/dos
9660srv 68: serving /srv/9660
----- cut here -----


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-13  6:15   ` [9fans] Strange boot behaviour Lucio De Re
@ 2004-02-13 12:58     ` Lucio De Re
  2004-02-14 21:49     ` David Presotto
  1 sibling, 0 replies; 24+ messages in thread
From: Lucio De Re @ 2004-02-13 12:58 UTC (permalink / raw)
  To: 9fans

On Fri, Feb 13, 2004 at 08:15:48AM +0200, Lucio De Re wrote:
>
> On Thu, Feb 12, 2004 at 09:35:59AM -0500, David Presotto wrote:
> >
> > So, if you load a 9pcf and a 9pcdisk both built from the same sources,
> > one can see #S/sd00/fossil and the other can't?  That should be
> > impossible.  I have no idea why that would be.
> >
> It isn't impossible, unless I'm misreading the new trace (appended
> below).  It is extremely unlikely that the 9pcdisk and 9pcf kernels
> are significantly different because of something I did and I suppose
> supplying sources etc won't really help unless I supply the hardware
> too :-(
> >
For the record, running fossil on top of a network loaded 9pcdisk
(with the necessary tweaks to recognise the SCSI controller - added
below unless I forget), seems just fine.  Not that I've done anything
extraordinary with it, not a Venti in sight, for example.

I just want the machine to be self-contained and running a CPU
kernel, so I still need a solution to my problem.  My feeling is
that somehow the partition information is not being read correctly,
but there isn't any obvious reason for it.

If no one has a better suggestion, I'll try to add debugging
information to 9load and the various kernels and watch what happens,
over the weekend.

++L


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-13  6:15   ` [9fans] Strange boot behaviour Lucio De Re
  2004-02-13 12:58     ` Lucio De Re
@ 2004-02-14 21:49     ` David Presotto
  2004-02-16  5:52       ` Lucio De Re
  2004-02-16  6:39       ` Lucio De Re
  1 sibling, 2 replies; 24+ messages in thread
From: David Presotto @ 2004-02-14 21:49 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]

You had two problems:

1) 9load told you

	Unknown boot device: sd00!9fat!9pcf

   i.e., it wasn't seeing the 9fat partition.

2) your booted kernel could not start from the partition

	root is from (tcp, il, local)[local!#S/sd00/fossil]:
	user[none]: glenda
	sd53c8xx: bios scntl3(00) stest2(00)
	boot: can't connect to file server: '#S/sd00/fossil' does not exist

All I suggested was that you boot 9pcf off of the net, connect to a root
file system on another machine, and then start from there to debug
your problem, i.e., see if you could then see '#S/sd00/fossil' etc.

You clearly did all the nice booting as your carets indicate:

boot from: ether0!/386/9pcdisk.gz
           ^^^^^^^^^^^^^^^^^^^^^^ note network boot
...
root is from (tcp, il, local)[local!#S/sd00/fossil]: il
^^^^^^^^^^^^                                         ^^ and network FS

However you never got to the important part, i.e., to figure out
why 9pcf couldn't see your disk.  Why did you boot 9pcdisk instead
of a 9pcf?  We need to find out why 9pcf won't find your disk.  Booting
a kernel you already know works won't get you anywhere closer to the
problem.  You need to get a 9pcf working on your machine and then start
trying to figure out what is wrong.

[-- Attachment #2: Type: message/rfc822, Size: 6149 bytes --]

From: Lucio De Re <lucio@proxima.alt.za>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Strange boot behaviour
Date: Fri, 13 Feb 2004 08:15:48 +0200
Message-ID: <20040213081548.K4743@cackle.proxima.alt.za>

On Thu, Feb 12, 2004 at 09:35:59AM -0500, David Presotto wrote:
>
> On Thu Feb 12 08:50:42 EST 2004, lucio@proxima.alt.za wrote:
>
> > 1.  Why does 9load not pick up #S/sd00 as a valid boot location?  It
> > only offers fd0 and ether0 as options.  Is the missing #S/
> > significant?  If so, then the installation is faulty.
>
> The #S isn't a problem.  9load doesn't know about # stuff.
> Whatever is wrong, this isn't it.
>
There is no 9pcf in 9fat, but the error message suggests that the
"device" is unknown.  New trace below.
> >
> > 2.  Why does the fossil kernel 9pcf.gz not find #S/sd00/fossil?  It
> > was created by the installation procedure and 9pcdisk.gz is aware of
> > it.  Hm, a missing "disk"?
>
> So, if you load a 9pcf and a 9pcdisk both built from the same sources,
> one can see #S/sd00/fossil and the other can't?  That should be
> impossible.  I have no idea why that would be.
>
It isn't impossible, unless I'm misreading the new trace (appended
below).  It is extremely unlikely that the 9pcdisk and 9pcf kernels
are significantly different because of something I did and I suppose
supplying sources etc won't really help unless I supply the hardware
too :-(
>
> I take it that this:
> 	sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007

and the next one, I'm not sure which one was used.  Maybe there's a
problem with a twin controller?

> is your disk.  If so, both the 9pcf kernel and 9load found it.  They
> just didn't find the partitions.  Can you boot 9pcf off of a
> network file server and see if you can look at the device at
> all?  Maybe its not sd00?  What partitions do you see.
>
9pcf is loaded off the network, it turned out to be the easiest route.
But I think you're asking something else, really.

> This would be the easiest way to figure it all out.
>
> > Lastly, 9loaddebug differs from 9load only in the absence of a -H3
> > load option and two hash out commands.  It doesn't work, either,
> > unless the -H3 is entered, in which case it is not different from
> > 9load.  Is it worth keeping?
> >
>
> It's there so we have something to run acid over that is in a format
> acid understands.  Acid doesn't do well with the -H3 format.

That explains.  I was hoping it was a version with debug enabled, but
that's not applicable.  I'll do some homework.

Here is the new boot trace, using 9pcdisk.gz instead of 9pcf.gz,
annotated:

----- cut here -----
sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
sd53c8xx: bios scntl3(70) stest2(00)
sd53c8xx: bios scntl3(70) stest2(00)
ether#0: elnk3: port 0x300 irq 11: 00A0244026C9
Unknown boot device: sd00!9fat!9pcf
^^^^^^^ I can put 9pcf on 9fat, if it helps
Boot devices: fd0 ether0
boot from: ether0!/386/9pcdisk.gz
           ^^^^^^^^^^^^^^^^^^^^^^ note network boot
tickle (192.96.32.69!67): /386/9pcdisk.gz
gz...729520 => 867796+818960+108996=1795752
entry: 80100020
Plan 9
apicbase 0xFEE00100
cpu0: 501MHz GenuineIntel Celeron (cpuid: AX 0x0665 DX 0x183F9FF)
ELCR: 1420
#l0: elnk3: 10Mbps port 0x300 irq 11: 00A0244026C9
sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
#U/usb0: uhci: port 0xE400 irq 10
19335 free pages, 77340K bytes, 309340K swap
root is from (tcp, il, local)[local!#S/sd00/fossil]: il
^^^^^^^^^^^^                                         ^^ and network FS
user[none]: glenda
sd53c8xx: bios scntl3(00) stest2(00)
sd53c8xx: bios scntl3(00) stest2(00)
version...
!Adding key: dom=proxima.alt.za proto=p9sk1
user[glenda]:
password:
!
time...
init: starting /bin/rc
dossrv: serving #s/dos
9660srv 68: serving /srv/9660
----- cut here -----

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-14 21:49     ` David Presotto
@ 2004-02-16  5:52       ` Lucio De Re
  2004-02-16  6:39       ` Lucio De Re
  1 sibling, 0 replies; 24+ messages in thread
From: Lucio De Re @ 2004-02-16  5:52 UTC (permalink / raw)
  To: 9fans

On Sat, Feb 14, 2004 at 04:49:16PM -0500, David Presotto wrote:
>
> However you never got to the important part, i.e., to figure out
> why 9pcf couldn't see your disk.  Why did you boot 9pcdisk instead
> of a 9pcf?  We need to find out why 9pcf won't find your disk.  Booting
Because I had already booted 9pcf in the previous trial, the one I
first mailed off.  Sorry, I should have repeated the details.  I did
mention in my second message that my fist try had used net booting.

> a kernel you already know works won't get you anywhere closer to the
> problem.  You need to get a 9pcf working on your machine and then start
> trying to figure out what is wrong.

Point taken.  I will repeat some of the experiments, I was hoping to
get to them over the weekend, I'll have to do it today.

++L


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-14 21:49     ` David Presotto
  2004-02-16  5:52       ` Lucio De Re
@ 2004-02-16  6:39       ` Lucio De Re
  2004-02-16  9:22         ` Lucio De Re
  1 sibling, 1 reply; 24+ messages in thread
From: Lucio De Re @ 2004-02-16  6:39 UTC (permalink / raw)
  To: 9fans

On Sat, Feb 14, 2004 at 04:49:16PM -0500, David Presotto wrote:
>
> All I suggested was that you boot 9pcf off of the net, connect to a root
> file system on another machine, and then start from there to debug
> your problem, i.e., see if you could then see '#S/sd00/fossil' etc.
>
I did miss something: "connect to a root file system on another
machine".  Will do now...

++L


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-16  6:39       ` Lucio De Re
@ 2004-02-16  9:22         ` Lucio De Re
  2004-02-17 15:20           ` Lucio De Re
  0 siblings, 1 reply; 24+ messages in thread
From: Lucio De Re @ 2004-02-16  9:22 UTC (permalink / raw)
  To: 9fans

On Mon, Feb 16, 2004 at 08:39:26AM +0200, Lucio De Re wrote:
> On Sat, Feb 14, 2004 at 04:49:16PM -0500, David Presotto wrote:
> >
> > All I suggested was that you boot 9pcf off of the net, connect to a root
> > file system on another machine, and then start from there to debug
> > your problem, i.e., see if you could then see '#S/sd00/fossil' etc.
> >
> I did miss something: "connect to a root file system on another
> machine".  Will do now...
>
I missed something even more obvious:

	boot from: ether0!/386/9pcf.gz
	tickle (192.96.32.69!67): /386/9pcf.gz
	gz...
	874346 => 867828+1146820+108996=2123644
	entry: 80100020

	Plan 9
	apicbase 0xFEE00100
	cpu0: 501MHz GenuineIntel Celeron (cpuid: AX 0x0665 DX 0x183F9FF)
	ELCR: 1420
	#l0: elnk3: 10Mbps port 0x300 irq 11: 00A0244026C9
	sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
	sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
	#U/usb0: uhci: port 0xE400 irq 10
	19287 free pages, 77148K bytes, 309148K swap
	root is from (tcp, il, local)[local!#S/sd00/fossil]: local
					    ^^^^^^^^ this is OK
	user[none]: lucio
	sd53c8xx: bios scntl3(00) stest2(00)
	sd53c8xx: bios scntl3(00) stest2(00)
	bopanic: boot process died: unknown
	ot: can't connect to file server: '#S/sdC0/' file does not exist
					   ^^^^^^^^ this I overlooked
	pdumpstack
	anic: boot process died: unknown

So I built a new kernel using pccpuf (my eventual target) with a
modified boot line as /386/9pccpufs.gz.

The effect was different, but disappointing:

	boot from: ether0!/386/9pccpufs.gz
	tickle (192.96.32.69!67): /386/9pccpufs.gz
	gz...870094 => 864947+1145560+142496=2153003
	entry: 80100020

	Plan 9
	apicbase 0xFEE00100
	cpu0: 501MHz GenuineIntel Celeron (cpuid: AX 0x0665 DX 0x183F9FF)
	ELCR: 1420
	#l0: elnk3: 10Mbps port 0x300 irq 11: 00A0244026C9
	sd53c8xx: SYM53C1010 rev. 0x01 intr=5 command=2300007
	sd53c8xx: SYM53C1010 rev. 0x01 intr=10 command=2300007
	#U/usb0: uhci: port 0xE400 irq 10
	22495 free pages, 89980K bytes, 729980K swap
	root is from (tcp, il, local)[local!#S/sd00/fossil]:
	sd53c8xx: bios scntl3(00) stest2(00)
	sd53c8xx: bios scntl3(00) stest2(00)
	can't read nvram: i/o error
	authid: proxima
	authdom: proxima.alt.za
	secstore key:
	password: password:
	can't write key to nvram: fd out of range or not open

Now it no longer complains about #S/sd00, but it still can't find
the partitions.  Wasn't there some mail about setting up the
partitions early, a while back?  I missed the importance of that
thread at the time, I'll refresh my memory right now.

Well, it looks to me like I need some adjustments to 9load to
identify the partitions on a SCSI disk, as (a) 9load itself fails
to find the partition table (but prep/fdisk succeed) and (b) that
would explain why 9pcf gets similarly lost.

I'll use Russ's part.c of 8 Nov '03 to do some testing.

++L


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [9fans] Strange boot behaviour
  2004-02-16  9:22         ` Lucio De Re
@ 2004-02-17 15:20           ` Lucio De Re
  0 siblings, 0 replies; 24+ messages in thread
From: Lucio De Re @ 2004-02-17 15:20 UTC (permalink / raw)
  To: 9fans

On Mon, Feb 16, 2004 at 11:22:12AM +0200, Lucio De Re wrote:
>
> On Mon, Feb 16, 2004 at 08:39:26AM +0200, Lucio De Re wrote:
>
> Well, it looks to me like I need some adjustments to 9load to
> identify the partitions on a SCSI disk, as (a) 9load itself fails
> to find the partition table (but prep/fdisk succeed) and (b) that
> would explain why 9pcf gets similarly lost.
>
Sorry to leave you all on tenterhooks <grin>...

I was offline yesterday, just beofer sending off a long, probably
off-topic comment about progress of some description.

It turned out that the SCSI subsystem needed tweaking.  I still don't
know why, but the most significant change was in:

/n/dump/2004/0216/sys/src/boot/pc/sdscsi.c:71 a /sys/src/boot/pc/sdscsi.c:72,75
>if(r->sense[12] == 0x04 && (r->sense[13] == 0x02 || r->sense[13] == 0x01)){
>	status = SDok;
> 	break;
>}

(leading spaces removed for readability).  This was adapted from
/sys/src/9/pc/sdscsi.c where the test reads:
>if(r->sense[12] == 0x04 && r->sense[13] == 0x02){

Having fixed this, I have a Venti-less Fossil.  While we're on the
subject of SCSI, the following changes to sd53c8xx.c are advantageous:

sd53c8xx.c:1825 c /n/dump/2004/0212/sys/src/boot/pc/sd53c8xx.c:1818
< 	KPRINT("sd53c8xx: bios scntl3(%.2x) stest2(%.2x)\n", c->bios.scntl3, c->bios.stest2);
---
> 	print("sd53c8xx: bios scntl3(%.2x) stest2(%.2x)\n", c->bios.scntl3, c->bios.stest2);
sd53c8xx.c:1851 d /n/dump/2004/0212/sys/src/boot/pc/sd53c8xx.c:1843
< #define SYM_1011_DID	0x0021
sd53c8xx.c:1871 d /n/dump/2004/0212/sys/src/boot/pc/sd53c8xx.c:1862
< { SYM_1011_DID,   0xff, "SYM53C1010",	Burst128, 16, 64, Prefetch|LocalRAM|BigFifo|Wide|Ultra|Ultra2 },

The first part silences an annoying debug statement (Nigel Roles
may disagree, still...), the latter allows me to use the KOUWELL
"PCI to Dual Channel Unltra 3 SCSI Card" that uses a slightly newer
SYM53C1010 controller.  To be precise the identification seems to
be SYM53C1010-66 if someone wants/needs to be pedantic.

Analogous changes apply to /sys/src/9/pc/sd53c8xx.c.  There is a
discrepancy between the 9/pc and boot/pc versions that may or may not be significant,
drop me a line if you want the details.

Then, to finish off, Fossil.  I seem to have screwed up my
installation by mismatching fossil binaries.  I don't recall the exact
details, but I definitely had different executables in /boot/fossil
and /386/bin/fossil/fossil.  By the time I copied /boot/fossil into
the /386/bin/fossil directory, I think some damage was already done.

I'm not sure if I'll be able to fix it, either, as what is probably a
third version of fossil/flchk (sigh!) reports:

	squiggle# fossil/flchk -f /dev/sd00/fossil
	cacheLocalData: addr=80992 type got 0 exp 8: tag got 76c04b7a exp 1
	fsOpen error
	fatal error: could not open file system: block label mismatch

I don't think I have enough time now to investigate further.  I do
wonder, though, if problems with "snap -a" might have the same
origin.

I certainly did report flchk failures a long time ago, immediately
after a manual snap -a and the reason I didn't pursue it at the
time was because the manual warns one against taking such failures
seriously when fossil is active.  That warning needs rewording as
it encourages one to expect the worse whether fossil is running or
not.

++L

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2004-02-17 15:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-12 13:48 [9fans] Strange boot behaviour Lucio De Re
2004-02-12 14:35 ` David Presotto
2004-02-12 16:16   ` [9fans] seeking for the truth rog
2004-02-12 16:32     ` David Presotto
2004-02-12 17:10       ` rog
2004-02-12 16:50     ` Dave Lukes
2004-02-12 16:59       ` Rob Pike
2004-02-12 17:03         ` Fco.J.Ballesteros
2004-02-12 17:17           ` Charles Forsyth
2004-02-12 17:22             ` Charles Forsyth
2004-02-12 18:00               ` rob pike, esq.
2004-02-12 18:05                 ` Charles Forsyth
2004-02-12 18:10                 ` rog
2004-02-12 18:10                   ` David Presotto
2004-02-12 17:26             ` Fco.J.Ballesteros
2004-02-12 17:35             ` Dave Lukes
2004-02-12 20:19     ` boyd, rounin
2004-02-13  6:15   ` [9fans] Strange boot behaviour Lucio De Re
2004-02-13 12:58     ` Lucio De Re
2004-02-14 21:49     ` David Presotto
2004-02-16  5:52       ` Lucio De Re
2004-02-16  6:39       ` Lucio De Re
2004-02-16  9:22         ` Lucio De Re
2004-02-17 15:20           ` Lucio De Re

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).