Hi,
I'm struggling with a system failure (home computer, but still
frustrating) and thought someone might be able to advise me on where
to apply my efforts/euro to get back in action...
Story so far:
I was running Ubuntu 8.04LTS for ages, and last week decided to
upgrade to the new LTS release (Lucid Lynx).
This went OK, and system seemed to be running fine. I didn't do
anything much with the computer during the week (except a couple of
hours of Web), though it was powered up as it's also a printer/scanner
server. When I came to the machine on Friday evening, it was powered
down, which I didn't think much off since there had been people in
working on the heating and any power interruption would leave it
powered down.
However, when I rebooted, after apparently coming up fine, it then
crashed after only a little bit of use (just small stuff like Web,
email, etc.,). Trying to reboot, it failed. Then tried safe mode,
and it worked, though I've since found that that was a red-herring.
The shut-down is a bit random, and may happen during first stage boot
(after grub, before splash), may come during fsck, or may come after
using the system for a few minutes to half an hour.
If the failure happens early in the boot, I get a stack-trace with
contents that I could only photograph. I'll transcribe some bits here
in case it means anything to anyone...
lots of irq stuff...
ret_from_intr+0x0/0x11
<EOI> <#MC> [<lots of address stuff>] ?panic+0x111/0x137
?panic+0xa1/0x137
mce_panic+0x1e32/0x210
do_machine_check+0x7d3/0x820
machine_check+0x1c/0x30
native_safe_halt+0xb/0x10
<<EOE>> [<lots of address>] ? default_idle+0x3d/0x90
c1e_idle+0x63/0x120
cpu_idle+0xb3/0x110
start_secondary+0xa8/0xaa
end trace
Anyway, my first thoughts were that it was a problem of the upgrade
and kernel version, some googling gave circumstantial evidence too. I
bought a new HDD yesterday, and installed Debain stable (Lenny) onto
that (after much rebooting while I tried to create the USB stick to
boot the install from).
That install ran fine, even though i picked lots of extra packages.
Only problem was X didn't configure itself properly, but a bit of time
would fix that. However, after maybe 45 minutes that system crashed
too... :-( At least with Debian I have found a likely way to crash
the system: Boot to console (remember, X not working, so this is
default). Fire up w3m, and go to nvidia pages to download the binary
nvidia drivers.
Start the download. 3 times the download of the installer from within
w3m has crashed the computer (so maybe a network/sata connection?).
However, while writing this mail, I tried again and that succeeded.
Also have downloaded (wget) half a full-size Debian ISO and via a
Python script have dumped 1.5Gb of ascii data to HDD without crash.
I am now thinking the problem is hardware related. First thought was
memory. The computer has 4x1GB of ram. I have tried running the
system with just 1GB using each of the modules, and have had the same
problem. I also did one run of the memtest grub option from ubuntu,
without errors. So the RAM is (by my reasoning) unlikely to be the
problem. That might leave CPU or motherboard (or PSU). During the
Debian install I did a 10 minute CPU burn in test without issue.
So it's a long story, but to summarise my empirical observations:
1. Ubuntu Lucid boots sometimes, crashes a bit randomly. Use seems
to make it crash more quickly.
2. Debian Lenny crashes after a longer period of time. Can be made to
crash immediately by using w3m task described below, but also might
crash during boot (did that just now on me).
3. Installer for Debian Lenny ran without any problems at all. This
took well over an hour, and would have involved lots of network access
(I used a minimal iso, but took lots of packages from mirror). I'm
not sure how this gels with the hardware theory.
4. Whole thing follows from a recent upgrade, which may be unrelated
(since now I can't keep the system up for more than an hour, and
before first crash it was ok for days.
My question (finally) is what would people do next?
Am I reasonable to infer that there is a hardware issue rather than
software? (or should I try to reinstall Ubuntu 8.04 to confirm?)
If it is hardware, what should I replace? I'll maybe find it hard not
to end up buying CPU+MB+RAM as the whole setup is a bit aged (maybe 4
years old, with DDR2 RAM), but I don't want to spend a lot of money.
Apologies for the long mail, hopefully someone has some time and
insight to share on a gloomy Sunday!
Cheers,
Michael
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!