From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <200010210808.e9L88PG47491@ducky.net> To: 9fans@cse.psu.edu From: Mike Haertel Subject: [9fans] bug fix to /sys/src/libdisk/disk.c and /386/bin/format Date: Sat, 21 Oct 2000 01:08:25 -0700 Topicbox-Message-UUID: 1ae50b78-eac9-11e9-9e20-41e7f4b1d025 I recently attempted to install Plan 9 on a 9GB SCSI disk using a Tekram controller. The install process appeared to work fine. When I reached the "bootsetup" step, I chose native Plan 9 boot from the hard disk. It made the usual whining about disks >2GB [*], but I had a multi-boot manager in the MBR, so I elected not to install Plan 9's MBR. Upon attempting to boot Plan 9, I got to PBS, and then got the failure: Bad format. It turned out the reason was that the disk geometry recorded in the PBS did not match the actual geometry being used by the SCSI BIOS. I worked backwards through the scripts and commands to reach the geometry guessing routines in /sys/src/libdisk/disk.c. Since I had a SCSI disk, the ATA geometry determination method (which is probably the most reliable) was not applicable. Upon inspecting the code, I found that opendisk() would attempt to look for an MBR in the first sector of the disk partition being opened, and if it found one, look for a consistent geometry in that MBR. Unfortunately, the first sector of sd00/9fat was *not* the first sector of the disk, it was just the first sector of the partition, and contained garbage. As an experiment, I tried running dd from another window on the install floppy, and copying the MBR from the real first sector of the disk to the first sector of the 9fat slice, and then running "bootsetup". Much to my relief, it created a Plan 9 boot sector with the correct geometry. I then began working on a real fix. Enclosed is a fix to opendisk() which modifies the routine partitiongeometry() to try looking for the MBR first in the whole-disk partition /dev/sdXX/data, and only if that fails will it look for an MBR in the first sector of the partition being opened. There is also another minor bug fix to a call to strdup() whose return area was being overwritten with a potentially longer string than the original string given to strdup(). I replaced format on the install floppy with one linked with my new libdisk, then I bravely zeroed out my 9fat partition contents, reconstructed the same Plan 9 partitions, and ran bootsetup again. It worked. I am very surprised nobody has complained about this before. As far as I can tell, without this fix Plan 9 will not install correctly on any reasonably large (> 1GB) SCSI disk unless it is installed near the very beginning of the disk. Footnote: [*] Just out of curiosity, what OSes' MBRs have trouble at the 2GB boundary? I am very familiar with trouble at the 8GB boundary requiring LBA addressing, but I have never seen an MBR that chokes at 2GB. Diffs (against the October release of Plan 9): % diff /sys/src/libdisk/disk.c disk.c 63c63,65 < Table *t; --- > Table *t = (Table *)(buf + Toffset); > char *rawname; > int rawfd; 65,67c67,92 < < if(seek(disk->fd, 0, 0) < 0) { < return -1; --- > /* > * look for an MBR first in the /dev/sdXX/data partition, otherwise > * attempt to fall back on the current partition. > */ > rawname = malloc(strlen(disk->prefix) + 5); /* prefix + "data" + nul */ > if (rawname) { > strcpy(rawname, disk->prefix); > strcat(rawname, "data"); > rawfd = open(rawname, OREAD); > free(rawname); > if (rawfd >= 0 > && seek(rawfd, 0, 0) >= 0 > && readn(rawfd, buf, 512) == 512 > && t->magic[0] == Magic0 > && t->magic[1] == Magic1) { > close(rawfd); > } else { > if (rawfd >= 0) > close(rawfd); > if (seek(disk->fd, 0, 0) < 0 > || readn(disk->fd, buf, 512) != 512 > || t->magic[0] != Magic0 > || t->magic[1] != Magic1) { > return -1; > } > } 70,77d94 < if(readn(disk->fd, buf, 512) != 512) { < return -1; < } < < t = (Table*)(buf+Toffset); < if(t->magic[0] != Magic0 || t->magic[1] != Magic1) < return -1; < 279c296 < p = strdup(disk); --- > p = malloc(strlen(disk) + 4); /* disk + "ctl" + nul */ 285a303 > strcpy(p, disk);