Saturday, May 10, 2008

disk killer - bug in ubuntu

behaviour of drives (high Load_Cycle_Count numbers) on (laptop) harddrives.

Because the bug report has 150 comments already, I've tried to summarize it here.

There appear to be two issues here: HDD spin down and "Power cycling", whereas the first one has a [WWW] relative short default (60 seconds), but the latter one gets the most complains (it's about Load_Cycle_Count).

The disk Load_Cycle_Count issue appears to be caused by a combination of two problems -- The first is overly-aggressive power management from what might be considered buggy hardware. The second is that Ubuntu appears to be touching the hard drive on a regular basis for one reason or another.

Note: In sections below relating to how to prevent damage to your hard disk, you should replace $HDD everywhere with your device, e.g. "/dev/sda" or "/dev/hda". If you have several harddrives, you need to change it accordingly and duplicate lines in workarounds.

Affected hardware

Machine

Hard Disk

Ubuntu

Other OS

Workaround

Notes

HP nx6325

Samsung HM250JI

Hardy=YES

Vista=NO

-B 254


HP nx6325

Fujitsu MHV2060BH

Gutsy=YES

N/A

-B 255


Dell XPS M1330

Western Digital WD1200BEVS-75UST0

Hardy=YES

N/A

-B 255


HP nc8430

Samsung 250GB

Hardy=YES

N/A

N/A


HP nw9440

Seagate ST910021AS

Edgy=YES

N/A

N/A


Gateway MT6451

Western Digital WD1200VE

Gutsy=YES

N/A

N/A


Thinkpad Z60m

Hitachi HTS541080G9SA00

Gutsy=YES

N/A

N/A


Toshiba P205

Fujitsu MHW2120BH

Gutsy=YES

N/A

-B 255


N/A

Hitachi HTS541210H9SA00

Ubuntu=YES

N/A

N/A


Dell Inspiron 1501

N/A

Ubuntu=YES

N/A

N/A


Dell XPS 1530

N/A

Ubuntu=YES

N/A

N/A


Thinkpad R52

Hitachi HTS541060G9AT00

Gutsy=YES

N/A

-B 192


Thinkpad 600E

Hitachi DK239A-65B

Ubuntu=YES

N/A

N/A


N/A

Hitachi HTS541010G9SA00

Ubuntu=YES

N/A

N/A


Compaq Evo600N

N/A

Ubuntu=YES

N/A

N/A


Dell Latitude C640

Western Digital 40GB

Gutsy=YES

XP=NO

N/A


Acer Aspire 1662wlm

Hitachi 5k100 60 GB

Ubuntu=YES

N/A

N/A


MacBook Pro C2D

Fujitsu 160GB

Gutsy=YES

MACOSX=YES

N/A


N/A

Travelstar 7K100 60GB

Ubuntu=YES

N/A

N/A


Dell Insipron 9400

Samsung 120GB

Ubuntu=YES

N/A

N/A


N/A

Toshiba MK8037GSX

Ubuntu=YES

Fedora=NO

OpenSUSE=NO


Dell Inspiron 6400

Samsung HM120JI

Gutsy=YES

N/A

N/A


Dell D620

N/A

Ubuntu=YES

N/A

N/A


N/A

Seagate ST9160821AS

Ubuntu=YES

N/A

N/A


Acer Aspire 1642 WLMi

N/A

Ubuntu=YES

N/A

N/A


Asus G1S-A1

Hitachi HTS541616J9SA00

Ubuntu=YES

N/A

N/A


HP NX6325

Hitachi HTS541080G9SA00

Ubuntu=YES

N/A

N/A


Dell latitude c840

Hitachi DK23EA-30

Ubuntu=YES

N/A

N/A


Thinkpad Z61t

Toshiba MK1032GSX

Gutsy=YES

N/A

N/A


Thinkpad T60

Hitachi HTS541080G9SA00

Gutsy=YES

N/A

N/A


Toshiba A100

Toshiba MK1234GSX

Ubuntu=YES

N/A

-B 254

On battery power

VIA Epia EX10000EG

Western Digital WD10EACS

Feisty=YES

N/A

N/A


Thinkpad T23

Seagate ST92811A

Ubuntu=YES

N/A

N/A


N/A

Toshiba MK3006GAL

Ubuntu=YES

N/A

-B 254


Thinkpad R50e

Hitachi HTS541060G9AT00

Feisty=YES

N/A

N/A


Powerbook G4

Seagate Momentus 5400.2 100GB

Ubuntu=YES

N/A

N/A


Dell Vostro 1000

Seagate ST9120822AS

Ubuntu=YES

N/A

N/A


HP dv6500

Western Digital WD800BEVS

Ubuntu=YES

N/A

-B 255


Thinkpad T42

Samsung HD

Ubuntu=YES

N/A

-B 254


Dell Vostro 1500

Seagate ST9160823AS

Gutsy=YES

XP=YES



Dell Latitude D630

N/A

Ubuntu=YES

N/A

-B 254


HP dv6602au

N/A

Ubuntu=YES

N/A

N/A


Acer Travelmate 4010

Hitachi IC25N060ATMR04

Ubuntu=YES

XP=YES

N/A


Thinkpad R61

Seagate ST9160821AS

Hardy=YES

N/A

N/A


Acer 3610

N/A

Ubuntu=YES

N/A

N/A


N/A

WDC WD5000AACS-00ZUB0

N/A

N/A

N/A


N/A

Seagate ST9100824AS

Ubuntu=YES

Vista=YES

XP=YES


Asus A6Q

Seagate ST980811AS

Hardy=YES

XP=NO

-B 254


HP 6820s

Hitachi HTS541616J9SA00

Hardy=YES

Vista=NO

-B 255


N/A

Hitachi HTS541616J9SA00

Ubuntu=YES

Vista=YES

N/A


N/A

ST9120822AS

Hardy=YES

XP=NO

N/A


Dell Inspiron 1525n

WDC WD2500BEVS-75UST0

Hardy=YES

Gutsy=NO

-B 255


Dell XPS m1330

N/A

Hardy=YES

N/A

-B 254


Check

You can check the current value of Load_Cycle_Count of your harddrive(s) using:

  • sudo smartctl -a $HDD | grep Load_Cycle_Count

(You need the smartmontools package for this. I also had to enable SMART monitoring for my drives using sudo smartctl -s on $HDD)

The values for this differ a lot (e.g. it's 0 on my desktop), but it goes up to > 600.000 for others, depending on the lifetime. TODO: add a section with sample values (including the value of Power_On_Hours).

What Ubuntu does

It [WWW] appears to be the official policy of Ubuntu that by default, Ubuntu should not adjust any power management settings of the harddisk. Unfortunately, this policy has two negative effects: It leaves quite a few people with broken hard drives that would otherwise not be broken, and it quite simply makes people who love Ubuntu feel neglected. This issue has been going on a long time.

The problem appears to be that some manufacturers' defaults are too aggressive and that Ubuntu might cause too many unbuffered disk accesses -- the combination of which can cause over a thousand parks a day on some systems.

In /etc/acpi/power.sh, laptop mode gets handled. If it gets enabled, hdparm is called with "-B 1", if it gets disabled with "-B 254". 254 is the least aggressive setting. 255 is off, but does not work for all disks.

In power.sh also the spindown timeout gets set, according to SPINDOWN_TIME from /etc/default/acpi-support (default: 12). This results in spinning down the harddrive after 12*5=60 seconds of inactivitiy. This does not influence Load_Cycle_Count however.

apm

[WWW] "/etc/apm" is not supposed to be used, but if it would get, it sets spindown to 60 seconds - but does not affect the general APM setting (hdparm -B). (there's a bug, which also used "power_conserve" here, if on_ac_power return "don't know" - [WWW] bug 156893)

Debug

[WWW] Blue has created a script, which acts as a wrapper around hdparm and logs, where it's called (including its arguments) From his report, it appears that hdparm always gets called through apm or the init script (the logfile excerpt appears to be from booting).

TODO: provide said helper script, allowing others to help to debug this.

Workaround

Various workarounds have been provided that involve adjusting or even turning off power-management of the hard drive. Please keep in mind that this can do more harm than good, so only apply them if you exactly understand what you are doing.

Try hdparm -B 255 $HDD or hdparm -B 254 $HDD. (255 is supposed to disable APM, but it does not work for some; so 254 sets it to the less aggressive setting)

There are different methods to keep this setting after reboot/resume. Your mileage may vary. There may be more workarounds in the bug report, but essentially, all are using "hdparm -B" to change the apm handling of the harddrive.

Force hdparm values in acpi hooks

Gilles posted the following workaround: Create a file called 99-fix-park.sh (keep the '99-' and the '.sh', but you can name the file as you like otherwise) with the following two lines:

 #!/bin/sh
hdparm -B 254 $HDD

and copy it to the following directories: /etc/acpi/resume.d/ and /etc/acpi/start.d/ ([WWW] https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/10)

laptop-mode-tools

Don posted another workaround: Install laptop-mode-tools and set CONTROL_HD_POWERMGMT=1 in /etc/laptop-mode/laptop-mode.conf ([WWW] https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/19) Here's another more verbose setup of laptop-mode-tools from Michael: [WWW] https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/63

An easy, step-by-step walkthrough for a situation-sensitive solution (AC/batteries/heat) can be found here: [WWW] http://vale.homelinux.net/wordpress/?p=199

Proposed fixes

  • FIXED: Converted most remaining 255's to 254s, and added an explanation.

Conclusion

This bug report has attracted a lot of concerned Ubuntu users and it seems quite clear from the user feedback, that other operating systems/distributions handle this better. However, the workaround should be quite simple and this wiki page is a first attempt, to fix this for the better.

Misc

ubuntu_demon has put together a list of TODOs: [WWW] https://launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/81

Comments

Please leave any comments/additions here. You may also edit the page directly, but please try to be clear and helpful. The problem has been confirmed and we know that it's a critical thing - please do not repeat that the bug status should be critical.

Could you kindly explain how to diagnose whether the settings on a system, at a point of time, are correct. I have followed the instructions but still find the load cycle count increasing. How do I diagnose the problem?

I think the 99-fix-park.sh scripts should be executable. If this is the case then I think we should add a note to the workaround.

Useful Links

No comments: