Debian bug report logs - #595
0.93R4 lockup, filesystem damage
Package: image (?); Reported by: Bill Mitchell <mitchell@mdd.comm.mot.com>; 100 days old.
Message received at debian-bugs:
From mdd.comm.mot.com!mitchell Mon Mar 13 19:52:22 1995
Return-Path: <mitchell@mdd.comm.mot.com>
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0roNef-0002siC; Mon, 13 Mar 95 19:52 PST
Received: from motgate.mot.com by pixar.com with SMTP id AA12417
(5.65c/IDA-1.4.4); Mon, 13 Mar 1995 19:52:16 -0800
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.6.10/8.6.10/MOT-3.5) with ESMTP id VAA22219; Mon, 13 Mar 1995 21:52:17 -0600
Received: from mdd.comm.mot.com (mdisea.mdd.comm.mot.com [138.242.64.201]) by pobox.mot.com (8.6.10/8.6.10/MOT-3.5) with SMTP id VAA05324; Mon, 13 Mar 1995 21:52:15 -0600
Received: from bb29c.mdd.comm.mot.com by mdd.comm.mot.com (4.1/SMI-4.1)
id AA03276; Mon, 13 Mar 95 19:52:14 PST
Received: by bb29c.mdd.comm.mot.com (4.1/SMI-4.1)
id AA00751; Mon, 13 Mar 95 19:52:10 PST
Date: Mon, 13 Mar 1995 19:52:10 -0800 (PST)
From: Bill Mitchell <mitchell@mdd.comm.mot.com>
X-Sender: mitchell@bb29c
To: Bruce Perens <bruce@pixar.com>, debian-bugs@pixar.com
Subject: Re: Bug#595: 0.93R4 lockup, filesystem damage
In-Reply-To: <Pine.SUN.3.90.950312115436.28170A-100000@bb29c>
Message-Id: <Pine.SUN.3.90.950313193512.743B-100000@bb29c>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Bruce P. said:
> > I'm not sure I believe that was the problem. However, it's easy to find
> > this sort of thing before it gets really bad. After running with an
> > earlier kernel, boot the system single-user, keep the root read-only,
> > and run fsck manually.
I responded:
> I'll give it a try, look for failures, and post any results which look
> meaningful. BTW, the earlier kernel was 1.1.83, I think.
I'm afraid that I didn't follow your suggestion too closely. Last
night I booted linux 1.1.83 on my 0.93R4 root, removed and installed
a bunch of packages, did a kernel build, and removed and installed
a bunch of packages while the build was going on. All worked
well, and I shut down a few hours later without rebooting. Tonight,
I tried to reboot with 1.1.94, and watched fsck complain about a lot
of problems and delete a bunch of files while coming up. The systenm
never reached a stable login prompt. It would offer "login: ", then
complain about aha1542c problems, offer "login: " again, and loop.
I booted up my trusty maint partition and ran fsck manually from
there. I'll append a session log of that below. After that fsck,
I was able to reboot my 0.93R4 root with the 1.1.94 kernel OK, but
got a screen full of trash on login. Investigation showed that
/etc/motd contained trash.
I reloaded 0.93R4 from scratch for tonight's work, and I don't
think I'l be rooting it under earlier kernels from here on out.
If there's a known kernel vintage prior to which the 0.93R4 root
has problems, we should probably check that while the root is
still read-only. If not, it's probably not a big deal. booting
early kernels should be rare, and getting rarer as time passes.
Here's the fsck session log, if it's of any use.
Script started on Mon Mar 13 08:59:23 1995
bash# e2fsck -a /dev/sda1
Duplicate or bad blocks in use!
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
bash# e2fsck /dev/sda1
e2fsck 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10
Pass 1: Checking inodes, blocks, and sizes
Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 2105: 18714 18715 18716 18717 18718 18719 18720 18721 18722 18723 18724 18725 18726 18727
Duplicate/bad block(s) in inode 2106: 18728 18729 18730 18731 18732 18733 18734 18735 18736 18737 18738 18739 18740 18741 18742 18743 18744 18745
Duplicate/bad block(s) in inode 2107: 18746 18747 18748 18749 18750 18751 18752 18753 18754 18755 18756 18757 18758 18759
Duplicate/bad block(s) in inode 3933: 18714 18715 18716 18717 18718 18719 18720 18721 18722 18723 18724 18725 18726 18727
Duplicate/bad block(s) in inode 3934: 18728 18729 18730 18731 18732 18733 18734 18735 18736 18737 18738 18739 18740 18741 18742 18743 18744 18745
Duplicate/bad block(s) in inode 3935: 18746 18747 18748 18749 18750 18751 18752 18753 18754 18755 18756 18757 18758 18759
Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 6 inodes containing duplicate/bad blocks.)
File /sbin/getty (inode #3935, mod time Sat Mar 11 01:46:34 1995)
has 14 duplicate blocks, shared with 1 file:
/etc/motd (inode #2107, mod time Sat Mar 11 01:46:34 1995)
Clone duplicate/bad blocks<y>? yes
File /sbin/fsck.minix (inode #3934, mod time Sat Mar 11 01:46:34 1995)
has 18 duplicate blocks, shared with 1 file:
/var/log/messages (inode #2106, mod time Sat Mar 11 01:46:34 1995)
Clone duplicate/bad blocks<y>? yes
File /sbin/clock (inode #3933, mod time Sat Mar 11 01:46:32 1995)
has 14 duplicate blocks, shared with 1 file:
/var/log/debug (inode #2105, mod time Sat Mar 11 01:46:32 1995)
Clone duplicate/bad blocks<y>? yes
File /etc/motd (inode #2107, mod time Sat Mar 11 01:46:34 1995)
has 14 duplicate blocks, shared with 1 file:
/sbin/getty (inode #3935, mod time Sat Mar 11 01:46:34 1995)
Duplicated blocks already reassigned or cloned.
File /var/log/messages (inode #2106, mod time Sat Mar 11 01:46:34 1995)
has 18 duplicate blocks, shared with 1 file:
/sbin/fsck.minix (inode #3934, mod time Sat Mar 11 01:46:34 1995)
Duplicated blocks already reassigned or cloned.
File /var/log/debug (inode #2105, mod time Sat Mar 11 01:46:32 1995)
has 14 duplicate blocks, shared with 1 file:
/sbin/clock (inode #3933, mod time Sat Mar 11 01:46:32 1995)
Duplicated blocks already reassigned or cloned.
Pass 2: Checking directory structure
Entry 'installkernel' in /sbin (3922) has deleted/unused inode 3936.
Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Check reference counts.
Unattached inode 2108
Connect to /lost+found<y>? yes
Inode 2108 has ref count 2, expecting 1.
Set i_nlinks to count<y>? yes
Pass 5: Checking group summary information.
Fix summary information<y>? yes
Block bitmap differences: -9267 -9268 -9269 -9270 -9271 -9272 -9273 -9274 -9275 -9276 -9277 -9278 -9279 -9280 -9281 -9282 -9283 -9285 -9286 -9287 -9288 -9289 -9290 -9291 -9293 -9308 -9309 -9310 -9311 -9312 -9313 -9314 -9315 -9316 -9317 -9319 -9321 -9322 -
9323 -9324 -9325 -9326 -9327 -9328 -9329 -9330 -9331 -9332 -9333 -9335 -9336 -9337 -9338 -9339 -9340 -9341 -9342 -9343 -9358 -9359 -9360 -9361 -9362 -9363 -9364 -9365 -9366 -9367 -9368 -9369 -9370 -9371 -9372 -9373 -9376 -9377 -9378 -9379 -9380 -9381 -938
2 -9383 -9384 -9385 -9386 -9387 -9388 -9389 -9390 -9391 -9392 -9393 -9394 -9395 -9425 -9426 -9427 -9428 -9429 -9430 -9431 -9432 -9433 -9435 -9436 -9437 -9438 -9439 -9440 -9441 -9448 -9449 -9450 -9451 -9458 -9459 -9460 -9461 -9462 -9463 -9464 -9465 -9473 -
9474 -9531. FIXED
Free blocks count wrong for group 0 (7829, counted=7783). FIXED
Free blocks count wrong for group 1 (5538, counted=5663). FIXED
Free blocks count wrong (24673, counted=24752). FIXED
Inode bitmap differences: +2108. FIXED
Free inodes count wrong for group #1 (1667, counted=1666). FIXED
Free inodes count wrong for group #2 (1807, counted=1808). FIXED
/dev/sda1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda1: 890/7680 files, 5952/30704 blocks
bash# e2fsck /dev/sda1
e2fsck 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10
/dev/sda1 is clean, no check.
bash# e2fsck /dev/sda2
/dev/sda2: 5275/17928 files, 46252/71680 blocks
bash#
Script done on Mon Mar 13 09:00:45 1995
Message sent:
From: iwj10@thor.cam.ac.uk (Ian Jackson)
To: Bill Mitchell <mitchell@mdd.comm.mot.com>
Subject: Bug#595: Info received (was Bug#595: 0.93R4 lockup, filesystem damage)
In-Reply-To: <Pine.SUN.3.90.950313193512.743B-100000@bb29c>
References: <Pine.SUN.3.90.950313193512.743B-100000@bb29c>
Thank you for the additional information you have supplied regarding
this problem report. It has been forwarded to the developers to
accompany the original report.
If you wish to continue to submit further information on your problem,
please do the same thing again: send it to debian-bugs@pixar.com, ensuring
that the Subject line starts with "Bug#595" or "Re: Bug#595" so that
we can identify it as relating to the same problem.
Please do not reply to the address at the top of this message,
unless you wish to report a problem with the bug-tracking system.
Ian Jackson
(maintainer, debian-bugs)
Message sent to debian-devel@pixar.com:
Subject: Bug#595: 0.93R4 lockup, filesystem damage
Reply-To: Bill Mitchell <mitchell@mdd.comm.mot.com>, debian-bugs@pixar.com
Resent-To: debian-devel@pixar.com
Resent-From: Bill Mitchell <mitchell@mdd.comm.mot.com>
Resent-Sender: iwj10@cus.cam.ac.uk
Resent-Date: Tue, 14 Mar 1995 04:03:04 GMT
Resent-Message-ID: <debian-bugs-handler.595.031403532814818@pixar.com>
X-Debian-PR-Package: image (?)
X-Debian-PR-Keywords:
Received: via spool for debian-bugs; Tue, 14 Mar 1995 04:03:04 GMT
Received: with rfc822 via encapsulated-mail id 031403532814818;
Tue, 14 Mar 1995 03:53:28 GMT
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0roNef-0002siC; Mon, 13 Mar 95 19:52 PST
Received: from motgate.mot.com by pixar.com with SMTP id AA12417
(5.65c/IDA-1.4.4); Mon, 13 Mar 1995 19:52:16 -0800
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.6.10/8.6.10/MOT-3.5) with ESMTP id VAA22219; Mon, 13 Mar 1995 21:52:17 -0600
Received: from mdd.comm.mot.com (mdisea.mdd.comm.mot.com [138.242.64.201]) by pobox.mot.com (8.6.10/8.6.10/MOT-3.5) with SMTP id VAA05324; Mon, 13 Mar 1995 21:52:15 -0600
Received: from bb29c.mdd.comm.mot.com by mdd.comm.mot.com (4.1/SMI-4.1)
id AA03276; Mon, 13 Mar 95 19:52:14 PST
Received: by bb29c.mdd.comm.mot.com (4.1/SMI-4.1)
id AA00751; Mon, 13 Mar 95 19:52:10 PST
Date: Mon, 13 Mar 1995 19:52:10 -0800 (PST)
From: Bill Mitchell <mitchell@mdd.comm.mot.com>
X-Sender: mitchell@bb29c
To: Bruce Perens <bruce@pixar.com>, debian-bugs@pixar.com
In-Reply-To: <Pine.SUN.3.90.950312115436.28170A-100000@bb29c>
Message-Id: <Pine.SUN.3.90.950313193512.743B-100000@bb29c>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Bruce P. said:
> > I'm not sure I believe that was the problem. However, it's easy to find
> > this sort of thing before it gets really bad. After running with an
> > earlier kernel, boot the system single-user, keep the root read-only,
> > and run fsck manually.
I responded:
> I'll give it a try, look for failures, and post any results which look
> meaningful. BTW, the earlier kernel was 1.1.83, I think.
I'm afraid that I didn't follow your suggestion too closely. Last
night I booted linux 1.1.83 on my 0.93R4 root, removed and installed
a bunch of packages, did a kernel build, and removed and installed
a bunch of packages while the build was going on. All worked
well, and I shut down a few hours later without rebooting. Tonight,
I tried to reboot with 1.1.94, and watched fsck complain about a lot
of problems and delete a bunch of files while coming up. The systenm
never reached a stable login prompt. It would offer "login: ", then
complain about aha1542c problems, offer "login: " again, and loop.
I booted up my trusty maint partition and ran fsck manually from
there. I'll append a session log of that below. After that fsck,
I was able to reboot my 0.93R4 root with the 1.1.94 kernel OK, but
got a screen full of trash on login. Investigation showed that
/etc/motd contained trash.
I reloaded 0.93R4 from scratch for tonight's work, and I don't
think I'l be rooting it under earlier kernels from here on out.
If there's a known kernel vintage prior to which the 0.93R4 root
has problems, we should probably check that while the root is
still read-only. If not, it's probably not a big deal. booting
early kernels should be rare, and getting rarer as time passes.
Here's the fsck session log, if it's of any use.
Script started on Mon Mar 13 08:59:23 1995
bash# e2fsck -a /dev/sda1
Duplicate or bad blocks in use!
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
bash# e2fsck /dev/sda1
e2fsck 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10
Pass 1: Checking inodes, blocks, and sizes
Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 2105: 18714 18715 18716 18717 18718 18719 18720 18721 18722 18723 18724 18725 18726 18727
Duplicate/bad block(s) in inode 2106: 18728 18729 18730 18731 18732 18733 18734 18735 18736 18737 18738 18739 18740 18741 18742 18743 18744 18745
Duplicate/bad block(s) in inode 2107: 18746 18747 18748 18749 18750 18751 18752 18753 18754 18755 18756 18757 18758 18759
Duplicate/bad block(s) in inode 3933: 18714 18715 18716 18717 18718 18719 18720 18721 18722 18723 18724 18725 18726 18727
Duplicate/bad block(s) in inode 3934: 18728 18729 18730 18731 18732 18733 18734 18735 18736 18737 18738 18739 18740 18741 18742 18743 18744 18745
Duplicate/bad block(s) in inode 3935: 18746 18747 18748 18749 18750 18751 18752 18753 18754 18755 18756 18757 18758 18759
Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 6 inodes containing duplicate/bad blocks.)
File /sbin/getty (inode #3935, mod time Sat Mar 11 01:46:34 1995)
has 14 duplicate blocks, shared with 1 file:
/etc/motd (inode #2107, mod time Sat Mar 11 01:46:34 1995)
Clone duplicate/bad blocks<y>? yes
File /sbin/fsck.minix (inode #3934, mod time Sat Mar 11 01:46:34 1995)
has 18 duplicate blocks, shared with 1 file:
/var/log/messages (inode #2106, mod time Sat Mar 11 01:46:34 1995)
Clone duplicate/bad blocks<y>? yes
File /sbin/clock (inode #3933, mod time Sat Mar 11 01:46:32 1995)
has 14 duplicate blocks, shared with 1 file:
/var/log/debug (inode #2105, mod time Sat Mar 11 01:46:32 1995)
Clone duplicate/bad blocks<y>? yes
File /etc/motd (inode #2107, mod time Sat Mar 11 01:46:34 1995)
has 14 duplicate blocks, shared with 1 file:
/sbin/getty (inode #3935, mod time Sat Mar 11 01:46:34 1995)
Duplicated blocks already reassigned or cloned.
File /var/log/messages (inode #2106, mod time Sat Mar 11 01:46:34 1995)
has 18 duplicate blocks, shared with 1 file:
/sbin/fsck.minix (inode #3934, mod time Sat Mar 11 01:46:34 1995)
Duplicated blocks already reassigned or cloned.
File /var/log/debug (inode #2105, mod time Sat Mar 11 01:46:32 1995)
has 14 duplicate blocks, shared with 1 file:
/sbin/clock (inode #3933, mod time Sat Mar 11 01:46:32 1995)
Duplicated blocks already reassigned or cloned.
Pass 2: Checking directory structure
Entry 'installkernel' in /sbin (3922) has deleted/unused inode 3936.
Clear<y>? yes
Pass 3: Checking directory connectivity
Pass 4: Check reference counts.
Unattached inode 2108
Connect to /lost+found<y>? yes
Inode 2108 has ref count 2, expecting 1.
Set i_nlinks to count<y>? yes
Pass 5: Checking group summary information.
Fix summary information<y>? yes
Block bitmap differences: -9267 -9268 -9269 -9270 -9271 -9272 -9273 -9274 -9275 -9276 -9277 -9278 -9279 -9280 -9281 -9282 -9283 -9285 -9286 -9287 -9288 -9289 -9290 -9291 -9293 -9308 -9309 -9310 -9311 -9312 -9313 -9314 -9315 -9316 -9317 -9319 -9321 -9322 -
9323 -9324 -9325 -9326 -9327 -9328 -9329 -9330 -9331 -9332 -9333 -9335 -9336 -9337 -9338 -9339 -9340 -9341 -9342 -9343 -9358 -9359 -9360 -9361 -9362 -9363 -9364 -9365 -9366 -9367 -9368 -9369 -9370 -9371 -9372 -9373 -9376 -9377 -9378 -9379 -9380 -9381 -938
2 -9383 -9384 -9385 -9386 -9387 -9388 -9389 -9390 -9391 -9392 -9393 -9394 -9395 -9425 -9426 -9427 -9428 -9429 -9430 -9431 -9432 -9433 -9435 -9436 -9437 -9438 -9439 -9440 -9441 -9448 -9449 -9450 -9451 -9458 -9459 -9460 -9461 -9462 -9463 -9464 -9465 -9473 -
9474 -9531. FIXED
Free blocks count wrong for group 0 (7829, counted=7783). FIXED
Free blocks count wrong for group 1 (5538, counted=5663). FIXED
Free blocks count wrong (24673, counted=24752). FIXED
Inode bitmap differences: +2108. FIXED
Free inodes count wrong for group #1 (1667, counted=1666). FIXED
Free inodes count wrong for group #2 (1807, counted=1808). FIXED
/dev/sda1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda1: 890/7680 files, 5952/30704 blocks
bash# e2fsck /dev/sda1
e2fsck 0.5a, 5-Apr-94 for EXT2 FS 0.5, 94/03/10
/dev/sda1 is clean, no check.
bash# e2fsck /dev/sda2
/dev/sda2: 5275/17928 files, 46252/71680 blocks
bash#
Script done on Mon Mar 13 09:00:45 1995
Message received at debian-bugs:
From mdd.comm.mot.com!mitchell Sun Mar 12 12:08:18 1995
Return-Path: <mitchell@mdd.comm.mot.com>
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0rntw1-00063EC; Sun, 12 Mar 95 12:08 PST
Received: from motgate.mot.com by pixar.com with SMTP id AA22190
(5.65c/IDA-1.4.4); Sun, 12 Mar 1995 12:08:13 -0800
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.6.10/8.6.10/MOT-3.5) with ESMTP id OAA15420; Sun, 12 Mar 1995 14:08:14 -0600
Received: from mdd.comm.mot.com (mdisea.mdd.comm.mot.com [138.242.64.201]) by pobox.mot.com (8.6.10/8.6.10/MOT-3.5) with SMTP id OAA23569; Sun, 12 Mar 1995 14:08:13 -0600
Received: from bb29c.mdd.comm.mot.com by mdd.comm.mot.com (4.1/SMI-4.1)
id AA28308; Sun, 12 Mar 95 12:08:12 PST
Received: by bb29c.mdd.comm.mot.com (4.1/SMI-4.1)
id AA28185; Sun, 12 Mar 95 12:08:03 PST
Date: Sun, 12 Mar 1995 12:08:02 -0800 (PST)
From: Bill Mitchell <mitchell@mdd.comm.mot.com>
X-Sender: mitchell@bb29c
To: Bruce Perens <bruce@pixar.com>
Cc: debian-bugs@pixar.com, debian-bugs@pixar.com
Subject: Re: Bug#595: 0.93R4 lockup, filesystem damage
In-Reply-To: <m0rnt5E-0006DDC@mongo.pixar.com>
Message-Id: <Pine.SUN.3.90.950312115436.28170A-100000@bb29c>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Sun, 12 Mar 1995, Bruce Perens wrote:
> > After reset-switch reboot, fsck
> > aborted with the message, "fsck failed. Please repair manually and
> > reboot. ...". After the fsck abort, I was left at a bash prompt,
> > from which I remounted the root filesystem rw and ran fsck manually.
>
> Oops. _Don't_ have the filesystem mounted for write when you run fsck.
> Fsck will repair the filesystem while it is mounted read-only, because
> fsck does I/O directly to the block device, and doesn't touch the
> mounted filesystem.
I'd say, then, that the message fsck puts up for this type of failure
is likely to mislead people into mounting rw before doing the manual
fsck. The exact message is:
fsck failed. Please repair manually and reboot. Please note
that the root filesystem is currently mounted read-only. To
remount it read-write:
bash# mount -n -o remount,rw /
CONTROL-D will reboot the system.
bash#
And the user is left at the "bash#" prompt. I typed CONTROL-D to
reboot once, cycled back through the same message to the "bash#"
prompt, remounted root read-write as this message seemed to suggest,
and ran fsck manually.
> > [... fsck put up messages about older kernels ...]
>
> I'm not sure I believe that was the problem. However, it's easy to find
> this sort of thing before it gets really bad. After running with an
> earlier kernel, boot the system single-user, keep the root read-only,
> and run fsck manually.
I'll give it a try, look for failures, and post any results which look
meaningful. BTW, the earlier kernel was 1.1.83, I think.
>[...]
> Have you had I/O errors on your disks? [...]
The only recurring disk problems I recall have been after system
lockups which led to reset-switch reboots. fsck is understandably
unhappy about that after reboot. I think I had one round of major
problems out of the blue. That was likely after running an old
kernel for a while on a 0.93R4 installation, but I don't recall
details.
Message sent:
From: iwj10@thor.cam.ac.uk (Ian Jackson)
To: Bill Mitchell <mitchell@mdd.comm.mot.com>
Subject: Bug#595: Info received (was Bug#595: 0.93R4 lockup, filesystem damage)
In-Reply-To: <Pine.SUN.3.90.950312115436.28170A-100000@bb29c>
References: <Pine.SUN.3.90.950312115436.28170A-100000@bb29c>
Thank you for the additional information you have supplied regarding
this problem report. It has been forwarded to the developers to
accompany the original report.
If you wish to continue to submit further information on your problem,
please do the same thing again: send it to debian-bugs@pixar.com, ensuring
that the Subject line starts with "Bug#595" or "Re: Bug#595" so that
we can identify it as relating to the same problem.
Please do not reply to the address at the top of this message,
unless you wish to report a problem with the bug-tracking system.
Ian Jackson
(maintainer, debian-bugs)
Message sent to debian-devel@pixar.com:
Subject: Bug#595: 0.93R4 lockup, filesystem damage
Reply-To: Bill Mitchell <mitchell@mdd.comm.mot.com>, debian-bugs@pixar.com
Resent-To: debian-devel@pixar.com
Resent-From: Bill Mitchell <mitchell@mdd.comm.mot.com>
Resent-Sender: iwj10@cus.cam.ac.uk
Resent-Date: Sun, 12 Mar 1995 20:18:01 GMT
Resent-Message-ID: <debian-bugs-handler.595.03122009243801@pixar.com>
X-Debian-PR-Package: image (?)
X-Debian-PR-Keywords:
Received: via spool for debian-bugs; Sun, 12 Mar 1995 20:18:01 GMT
Received: with rfc822 via encapsulated-mail id 03122009243801;
Sun, 12 Mar 1995 20:09:24 GMT
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0rntw1-00063EC; Sun, 12 Mar 95 12:08 PST
Received: from motgate.mot.com by pixar.com with SMTP id AA22190
(5.65c/IDA-1.4.4); Sun, 12 Mar 1995 12:08:13 -0800
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.6.10/8.6.10/MOT-3.5) with ESMTP id OAA15420; Sun, 12 Mar 1995 14:08:14 -0600
Received: from mdd.comm.mot.com (mdisea.mdd.comm.mot.com [138.242.64.201]) by pobox.mot.com (8.6.10/8.6.10/MOT-3.5) with SMTP id OAA23569; Sun, 12 Mar 1995 14:08:13 -0600
Received: from bb29c.mdd.comm.mot.com by mdd.comm.mot.com (4.1/SMI-4.1)
id AA28308; Sun, 12 Mar 95 12:08:12 PST
Received: by bb29c.mdd.comm.mot.com (4.1/SMI-4.1)
id AA28185; Sun, 12 Mar 95 12:08:03 PST
Date: Sun, 12 Mar 1995 12:08:02 -0800 (PST)
From: Bill Mitchell <mitchell@mdd.comm.mot.com>
X-Sender: mitchell@bb29c
To: Bruce Perens <bruce@pixar.com>
Cc: debian-bugs@pixar.com, debian-bugs@pixar.com
In-Reply-To: <m0rnt5E-0006DDC@mongo.pixar.com>
Message-Id: <Pine.SUN.3.90.950312115436.28170A-100000@bb29c>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Sun, 12 Mar 1995, Bruce Perens wrote:
> > After reset-switch reboot, fsck
> > aborted with the message, "fsck failed. Please repair manually and
> > reboot. ...". After the fsck abort, I was left at a bash prompt,
> > from which I remounted the root filesystem rw and ran fsck manually.
>
> Oops. _Don't_ have the filesystem mounted for write when you run fsck.
> Fsck will repair the filesystem while it is mounted read-only, because
> fsck does I/O directly to the block device, and doesn't touch the
> mounted filesystem.
I'd say, then, that the message fsck puts up for this type of failure
is likely to mislead people into mounting rw before doing the manual
fsck. The exact message is:
fsck failed. Please repair manually and reboot. Please note
that the root filesystem is currently mounted read-only. To
remount it read-write:
bash# mount -n -o remount,rw /
CONTROL-D will reboot the system.
bash#
And the user is left at the "bash#" prompt. I typed CONTROL-D to
reboot once, cycled back through the same message to the "bash#"
prompt, remounted root read-write as this message seemed to suggest,
and ran fsck manually.
> > [... fsck put up messages about older kernels ...]
>
> I'm not sure I believe that was the problem. However, it's easy to find
> this sort of thing before it gets really bad. After running with an
> earlier kernel, boot the system single-user, keep the root read-only,
> and run fsck manually.
I'll give it a try, look for failures, and post any results which look
meaningful. BTW, the earlier kernel was 1.1.83, I think.
>[...]
> Have you had I/O errors on your disks? [...]
The only recurring disk problems I recall have been after system
lockups which led to reset-switch reboots. fsck is understandably
unhappy about that after reboot. I think I had one round of major
problems out of the blue. That was likely after running an old
kernel for a while on a 0.93R4 installation, but I don't recall
details.
Message received at debian-bugs:
From pixar.com!bruce Sun Mar 12 11:13:49 1995
Return-Path: <bruce@pixar.com>
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0rnt5I-0006DDC; Sun, 12 Mar 95 11:13 PST
Received: from mongo.pixar.com by pixar.com with SMTP id AA21587
(5.65c/IDA-1.4.4 for <debian-bugs@pixar.com>); Sun, 12 Mar 1995 11:13:46 -0800
Received: by mongo.pixar.com (Smail3.1.28.1 #15)
id m0rnt5E-0006DDC; Sun, 12 Mar 95 11:13 PST
Message-Id: <m0rnt5E-0006DDC@mongo.pixar.com>
Date: Sun, 12 Mar 95 11:13 PST
From: bruce@pixar.com (Bruce Perens)
To: debian-bugs@pixar.com, debian-bugs@pixar.com,
Bill Mitchell <mitchell@mdd.comm.mot.com>
Subject: Re: Bug#595: 0.93R4 lockup, filesystem damage
> After reset-switch reboot, fsck
> aborted with the message, "fsck failed. Please repair manually and
> reboot. ...". After the fsck abort, I was left at a bash prompt,
> from which I remounted the root filesystem rw and ran fsck manually.
Oops. _Don't_ have the filesystem mounted for write when you run fsck.
Fsck will repair the filesystem while it is mounted read-only, because
fsck does I/O directly to the block device, and doesn't touch the
mounted filesystem. If you have the filesystem mounted for write, it
will _un-do_ some of the repairs that fsck has done when it writes
blocks such as the superblock back to the filesystem after fsck has
already written different versions of the same data.
> I didn't note the exact messages, but I do recall fsck remarking that
> some of the problems it saw were likely caused by running earlier
> kernels. That makes sense, as I have occasionally run earlier kernels
> on my 0.93R4 installation. However much it makes sense, though, it's
> not a good thing.
I'm not sure I believe that was the problem. However, it's easy to find
this sort of thing before it gets really bad. After running with an
earlier kernel, boot the system single-user, keep the root read-only,
and run fsck manually.
Often a problem starts out with blocks appearing in _both_ the free
list and in files (this is the "dup block in free list" condition).
That's repairable until a file gets written that takes the block from
the free list and then puts it in another file. Then you have a "dup"
block that belongs to two files, and you have to delete both files
because it's sure that at least one contains corrupt data. If you run
fsck right after a problem might have happened, you'll probably catch
this while the dups are still in free.
Have you had I/O errors on your disks? I find that ext2fs (and probably
most other Linux filesystems) aren't robust around disk write failures.
There are also order-of-operation problems that I'm not sure every
filesystem gets right. For instance, you should always write the free list
block after removing blocks from the free list _before_ you write the block
list of a file to which you've allocated the blocks. This will prevent the
"dups in free" problem.
Bruce
Bruce
--
Bruce Perens AB6YM 510-215-3502 Bruce@Pixar.com
Attention Ham Radio Operators: For information on "Linux for Hams", read
the World Wide Web page http://www.rahul.net/perens/LinuxForHams
Message sent:
From: iwj10@thor.cam.ac.uk (Ian Jackson)
To: bruce@pixar.com (Bruce Perens)
Subject: Bug#595: Info received (was Bug#595: 0.93R4 lockup, filesystem damage)
In-Reply-To: <m0rnt5E-0006DDC@mongo.pixar.com>
References: <m0rnt5E-0006DDC@mongo.pixar.com>
Thank you for the additional information you have supplied regarding
this problem report. It has been forwarded to the developers to
accompany the original report.
If you wish to continue to submit further information on your problem,
please do the same thing again: send it to debian-bugs@pixar.com, ensuring
that the Subject line starts with "Bug#595" or "Re: Bug#595" so that
we can identify it as relating to the same problem.
Please do not reply to the address at the top of this message,
unless you wish to report a problem with the bug-tracking system.
Ian Jackson
(maintainer, debian-bugs)
Message sent to debian-devel@pixar.com:
Subject: Bug#595: 0.93R4 lockup, filesystem damage
Reply-To: bruce@pixar.com (Bruce Perens), debian-bugs@pixar.com
Resent-To: debian-devel@pixar.com
Resent-From: bruce@pixar.com (Bruce Perens)
Resent-Sender: iwj10@cus.cam.ac.uk
Resent-Date: Sun, 12 Mar 1995 19:18:01 GMT
Resent-Message-ID: <debian-bugs-handler.595.03121914541740@pixar.com>
X-Debian-PR-Package: image (?)
X-Debian-PR-Keywords:
Received: via spool for debian-bugs; Sun, 12 Mar 1995 19:18:01 GMT
Received: with rfc822 via encapsulated-mail id 03121914541740;
Sun, 12 Mar 1995 19:14:54 GMT
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0rnt5I-0006DDC; Sun, 12 Mar 95 11:13 PST
Received: from mongo.pixar.com by pixar.com with SMTP id AA21587
(5.65c/IDA-1.4.4 for <debian-bugs@pixar.com>); Sun, 12 Mar 1995 11:13:46 -0800
Received: by mongo.pixar.com (Smail3.1.28.1 #15)
id m0rnt5E-0006DDC; Sun, 12 Mar 95 11:13 PST
Message-Id: <m0rnt5E-0006DDC@mongo.pixar.com>
Date: Sun, 12 Mar 95 11:13 PST
From: bruce@pixar.com (Bruce Perens)
To: debian-bugs@pixar.com, debian-bugs@pixar.com,
Bill Mitchell <mitchell@mdd.comm.mot.com>
> After reset-switch reboot, fsck
> aborted with the message, "fsck failed. Please repair manually and
> reboot. ...". After the fsck abort, I was left at a bash prompt,
> from which I remounted the root filesystem rw and ran fsck manually.
Oops. _Don't_ have the filesystem mounted for write when you run fsck.
Fsck will repair the filesystem while it is mounted read-only, because
fsck does I/O directly to the block device, and doesn't touch the
mounted filesystem. If you have the filesystem mounted for write, it
will _un-do_ some of the repairs that fsck has done when it writes
blocks such as the superblock back to the filesystem after fsck has
already written different versions of the same data.
> I didn't note the exact messages, but I do recall fsck remarking that
> some of the problems it saw were likely caused by running earlier
> kernels. That makes sense, as I have occasionally run earlier kernels
> on my 0.93R4 installation. However much it makes sense, though, it's
> not a good thing.
I'm not sure I believe that was the problem. However, it's easy to find
this sort of thing before it gets really bad. After running with an
earlier kernel, boot the system single-user, keep the root read-only,
and run fsck manually.
Often a problem starts out with blocks appearing in _both_ the free
list and in files (this is the "dup block in free list" condition).
That's repairable until a file gets written that takes the block from
the free list and then puts it in another file. Then you have a "dup"
block that belongs to two files, and you have to delete both files
because it's sure that at least one contains corrupt data. If you run
fsck right after a problem might have happened, you'll probably catch
this while the dups are still in free.
Have you had I/O errors on your disks? I find that ext2fs (and probably
most other Linux filesystems) aren't robust around disk write failures.
There are also order-of-operation problems that I'm not sure every
filesystem gets right. For instance, you should always write the free list
block after removing blocks from the free list _before_ you write the block
list of a file to which you've allocated the blocks. This will prevent the
"dups in free" problem.
Bruce
Bruce
--
Bruce Perens AB6YM 510-215-3502 Bruce@Pixar.com
Attention Ham Radio Operators: For information on "Linux for Hams", read
the World Wide Web page http://www.rahul.net/perens/LinuxForHams
Message received at debian-bugs:
From mdd.comm.mot.com!mitchell Sun Mar 12 08:06:45 1995
Return-Path: <mitchell@mdd.comm.mot.com>
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0rnqAG-00062HC; Sun, 12 Mar 95 08:06 PST
Received: from motgate.mot.com by pixar.com with SMTP id AA20541
(5.65c/IDA-1.4.4 for <debian-bugs@pixar.com>); Sun, 12 Mar 1995 08:06:41 -0800
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.6.10/8.6.10/MOT-3.5) with ESMTP id KAA02146 for <debian-bugs@pixar.com>; Sun, 12 Mar 1995 10:06:42 -0600
Received: from mdd.comm.mot.com (mdisea.mdd.comm.mot.com [138.242.64.201]) by pobox.mot.com (8.6.10/8.6.10/MOT-3.5) with SMTP id KAA02271 for <debian-bugs@pixar.com>; Sun, 12 Mar 1995 10:06:41 -0600
Received: from bb29c.mdd.comm.mot.com by mdd.comm.mot.com (4.1/SMI-4.1)
id AA22206; Sun, 12 Mar 95 08:06:39 PST
Received: by bb29c.mdd.comm.mot.com (4.1/SMI-4.1)
id AA28003; Sun, 12 Mar 95 08:06:27 PST
Date: Sun, 12 Mar 1995 08:06:26 -0800 (PST)
From: Bill Mitchell <mitchell@mdd.comm.mot.com>
X-Sender: mitchell@bb29c
To: debian-bugs@pixar.com
Subject: 0.93R4 lockup, filesystem damage
Message-Id: <Pine.SUN.3.90.950312075139.27999A-100000@bb29c>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
PACKAGE: image (?)
VERSION: 0.93R4
This morning, I had the system lock up. I had vi running on one
VC and others at the bash prompt. The vi session stopped responding,
and I couldn't run ps from one shell prompt, or reboot from another.
Ctrl-Alt-Del produced no reaction. After reset-switch reboot, fsck
aborted with the message, "fsck failed. Please repair manually and
reboot. ...". After the fsck abort, I was left at a bash prompt,
from which I remounted the root filesystem rw and ran fsck manually.
The manual fsck reported repairing numerous problems, and eventually
got me to a login prompt. During the fsck/repair, numerous files
were deleted (/usr/bin/elvis is one file I found missing soon
after my next login).
I didn't note the exact messages, but I do recall fsck remarking that
some of the problems it saw were likely caused by running earlier
kernels. That makes sense, as I have occasionally run earlier kernels
on my 0.93R4 installation. However much it makes sense, though, it's
not a good thing.
If there is a kernel-vintage point before which our 0.93R4 installation
(currently with a 1.1.94 kernel) is not backwards compatable, is it
possible to detect an atempt to start up with a too-early kernel
early enough during startup to abort startup with a message and without
file system damage? If so, I'd suggest that this be done.
mitchell@mdd.comm.mot.com (Bill Mitchell)
Message sent:
From: iwj10@thor.cam.ac.uk (Ian Jackson)
To: Bill Mitchell <mitchell@mdd.comm.mot.com>
Subject: Bug#595: Acknowledgement (was: 0.93R4 lockup, filesystem damage)
In-Reply-To: <Pine.SUN.3.90.950312075139.27999A-100000@bb29c>
References: <Pine.SUN.3.90.950312075139.27999A-100000@bb29c>
Thank you for the problem report you have sent regarding Debian GNU/Linux.
This is an automatically generated reply, to let you know your message has
been received. It is being forwarded to the developers' mailing list for
their attention; they will reply in due course.
If you wish to submit further information on your problem, please send
it to debian-bugs@pixar.com, but please ensure that the Subject
line of your message starts with "Bug#595" or "Re: Bug#595" so that
we can identify it as relating to the same problem.
Please do not reply to the address at the top of this message,
unless you wish to report a problem with the bug-tracking system.
Ian Jackson
(maintainer, debian-bugs)
Message sent to debian-devel@pixar.com:
Subject: Bug#595: 0.93R4 lockup, filesystem damage
Reply-To: Bill Mitchell <mitchell@mdd.comm.mot.com>, debian-bugs@pixar.com
Resent-To: debian-devel@pixar.com
Resent-From: Bill Mitchell <mitchell@mdd.comm.mot.com>
Resent-Sender: iwj10@cus.cam.ac.uk
Resent-Date: Sun, 12 Mar 1995 16:18:02 GMT
Resent-Message-ID: <debian-bugs-handler.595.031216075325215@pixar.com>
X-Debian-PR-Package: image (?)
X-Debian-PR-Keywords:
Received: via spool for debian-bugs; Sun, 12 Mar 1995 16:18:02 GMT
Received: with rfc822 via encapsulated-mail id 031216075325215;
Sun, 12 Mar 1995 16:07:53 GMT
Received: from pixar.com by mongo.pixar.com with smtp
(Smail3.1.28.1 #15) id m0rnqAG-00062HC; Sun, 12 Mar 95 08:06 PST
Received: from motgate.mot.com by pixar.com with SMTP id AA20541
(5.65c/IDA-1.4.4 for <debian-bugs@pixar.com>); Sun, 12 Mar 1995 08:06:41 -0800
Received: from pobox.mot.com (pobox.mot.com [129.188.137.100]) by motgate.mot.com (8.6.10/8.6.10/MOT-3.5) with ESMTP id KAA02146 for <debian-bugs@pixar.com>; Sun, 12 Mar 1995 10:06:42 -0600
Received: from mdd.comm.mot.com (mdisea.mdd.comm.mot.com [138.242.64.201]) by pobox.mot.com (8.6.10/8.6.10/MOT-3.5) with SMTP id KAA02271 for <debian-bugs@pixar.com>; Sun, 12 Mar 1995 10:06:41 -0600
Received: from bb29c.mdd.comm.mot.com by mdd.comm.mot.com (4.1/SMI-4.1)
id AA22206; Sun, 12 Mar 95 08:06:39 PST
Received: by bb29c.mdd.comm.mot.com (4.1/SMI-4.1)
id AA28003; Sun, 12 Mar 95 08:06:27 PST
Date: Sun, 12 Mar 1995 08:06:26 -0800 (PST)
From: Bill Mitchell <mitchell@mdd.comm.mot.com>
X-Sender: mitchell@bb29c
To: debian-bugs@pixar.com
Message-Id: <Pine.SUN.3.90.950312075139.27999A-100000@bb29c>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
PACKAGE: image (?)
VERSION: 0.93R4
This morning, I had the system lock up. I had vi running on one
VC and others at the bash prompt. The vi session stopped responding,
and I couldn't run ps from one shell prompt, or reboot from another.
Ctrl-Alt-Del produced no reaction. After reset-switch reboot, fsck
aborted with the message, "fsck failed. Please repair manually and
reboot. ...". After the fsck abort, I was left at a bash prompt,
from which I remounted the root filesystem rw and ran fsck manually.
The manual fsck reported repairing numerous problems, and eventually
got me to a login prompt. During the fsck/repair, numerous files
were deleted (/usr/bin/elvis is one file I found missing soon
after my next login).
I didn't note the exact messages, but I do recall fsck remarking that
some of the problems it saw were likely caused by running earlier
kernels. That makes sense, as I have occasionally run earlier kernels
on my 0.93R4 installation. However much it makes sense, though, it's
not a good thing.
If there is a kernel-vintage point before which our 0.93R4 installation
(currently with a 1.1.94 kernel) is not backwards compatable, is it
possible to detect an atempt to start up with a too-early kernel
early enough during startup to abort startup with a message and without
file system damage? If so, I'd suggest that this be done.
mitchell@mdd.comm.mot.com (Bill Mitchell)
Ian Jackson /
iwj10@thor.cam.ac.uk,
with the debian-bugs tracking mechanism