XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

#307550

Author: Tomek Kruszona
Date: Tue, 09 Jun 2009 14:16

293 lines
12394 bytes

Hello!

I have a problem with system in the configuration described in subject
(Dell PE1950 III + PERC 5/E + 2xMD1000)

System is running Debian Lenny AMD64 version with all available updates.

I have 6 VD's 2TB each (for 32bit system compatibility). Each VD is a
LVM2 PV

I made a LVM2 volume and formatted this into XFS. Previously it was only
one MD1000 connected to PERC controller.

But two days ago i added next MD1000 added new PV's to LVM2 and extended
XFS with xfs_growfs.

After some time I got kernel panic like this:

[46925.374954] Filesystem "dm-0": XFS internal error xfs_trans_cancel at
line 11
63 of file fs/xfs/xfs_trans.c.  Caller 0xffffffffa02b2e82
[46925.374954] Pid: 12269, comm: smbd Not tainted 2.6.26-2-amd64 #1
[46925.374954]
[46925.374954] Call Trace:
[46925.374954]  [<ffffffffa02b2e82>]
:xfs:xfs_iomap_write_allocate+0x360/0x385
[46925.374954]  [<ffffffffa02c095e>] :xfs:xfs_trans_cancel+0x55/0xed
[46925.374954]  [<ffffffffa02b2e82>]
:xfs:xfs_iomap_write_allocate+0x360/0x385
[46925.374954]  [<ffffffffa02b38f4>] :xfs:xfs_iomap+0x21b/0x297
[46925.374954]  [<ffffffffa02c9637>] :xfs:xfs_map_blocks+0x2d/0x5f
[46925.374954]  [<ffffffffa02ca74e>] :xfs:xfs_page_state_convert+0x2a2/0x54f
[46925.374954]  [<ffffffffa02cab5a>] :xfs:xfs_vm_writepage+0xb4/0xea
[46925.374954]  [<ffffffff802770db>] __writepage+0xa/0x23
[46925.374954]  [<ffffffff802775a0>] write_cache_pages+0x182/0x2b1
[46925.374954]  [<ffffffff802770d1>] __writepage+0x0/0x23
[46925.374954]  [<ffffffff8027770b>] do_writepages+0x20/0x2d
[46925.374954]  [<ffffffff80271900>] __filemap_fdatawrite_range+0x51/0x5b
[46925.374954]  [<ffffffffa02cd2b0>] :xfs:xfs_flush_pages+0x4e/0x6d
[46925.374954]  [<ffffffffa02c5548>] :xfs:xfs_setattr+0x695/0xd28
[46925.374954]  [<ffffffff803b1326>] sock_common_recvmsg+0x30/0x45
[46925.374954]  [<ffffffffa02d08cc>] :xfs:xfs_write+0x6de/0x722
[46925.374954]  [<ffffffffa02cf1e3>] :xfs:xfs_vn_setattr+0x11c/0x13a
[46925.374954]  [<ffffffff802add8f>] notify_change+0x174/0x2f5
[46925.374954]  [<ffffffff80299f09>] do_truncate+0x5e/0x79
[46925.374954]  [<ffffffff8029df53>] sys_newfstat+0x20/0x29
[46925.374954]  [<ffffffff8029a00e>] sys_ftruncate+0xea/0x107
[46925.374954]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f
[46925.374954]
[46925.374954] xfs_force_shutdown(dm-0,0x8) called from line 1164 of
file fs/xfs
/xfs_trans.c.  Return address = 0xffffffffa02c0977
[46925.374954] Filesystem "dm-0": Corruption of in-memory data detected.
 Shutti
ng down filesystem: dm-0
[46925.376874] Please umount the filesystem, and rectify the problem(s)
[46934.390143] Filesystem "dm-0": xfs_log_force: error 5 returned.
[46925.374954] Filesystem "dm-0": XFS internal error xfs_trans_cancel at
line 11
63 of file fs/xfs/xfs_trans.c.  Caller 0xffffffffa02b2e82
[46925.374954] Pid: 12269, comm: smbd Not tainted 2.6.26-2-amd64 #1
[46925.374954]
[46925.374954] Call Trace:
[46925.374954]  [<ffffffffa02b2e82>]
:xfs:xfs_iomap_write_allocate+0x360/0x385
[46925.374954]  [<ffffffffa02c095e>] :xfs:xfs_trans_cancel+0x55/0xed
[46925.374954]  [<ffffffffa02b2e82>]
:xfs:xfs_iomap_write_allocate+0x360/0x385
[46925.374954]  [<ffffffffa02b38f4>] :xfs:xfs_iomap+0x21b/0x297
[46925.374954]  [<ffffffffa02c9637>] :xfs:xfs_map_blocks+0x2d/0x5f
[46925.374954]  [<ffffffffa02ca74e>] :xfs:xfs_page_state_convert+0x2a2/0x54f
[46925.374954]  [<ffffffffa02cab5a>] :xfs:xfs_vm_writepage+0xb4/0xea
[46925.374954]  [<ffffffff802770db>] __writepage+0xa/0x23
[46925.374954]  [<ffffffff802775a0>] write_cache_pages+0x182/0x2b1
[46925.374954]  [<ffffffff802770d1>] __writepage+0x0/0x23
[46925.374954]  [<ffffffff8027770b>] do_writepages+0x20/0x2d
[46925.374954]  [<ffffffff80271900>] __filemap_fdatawrite_range+0x51/0x5b
[46925.374954]  [<ffffffffa02cd2b0>] :xfs:xfs_flush_pages+0x4e/0x6d
[46925.374954]  [<ffffffffa02c5548>] :xfs:xfs_setattr+0x695/0xd28
[46925.374954]  [<ffffffff803b1326>] sock_common_recvmsg+0x30/0x45
[46925.374954]  [<ffffffffa02d08cc>] :xfs:xfs_write+0x6de/0x722
[46925.374954]  [<ffffffffa02cf1e3>] :xfs:xfs_vn_setattr+0x11c/0x13a
[46925.374954]  [<ffffffff802add8f>] notify_change+0x174/0x2f5
[46925.374954]  [<ffffffff80299f09>] do_truncate+0x5e/0x79
[46925.374954]  [<ffffffff8029df53>] sys_newfstat+0x20/0x29
[46925.374954]  [<ffffffff8029a00e>] sys_ftruncate+0xea/0x107
[46925.374954]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f
[46925.374954]
[46925.374954] xfs_force_shutdown(dm-0,0x8) called from line 1164 of
file fs/xfs
/xfs_trans.c.  Return address = 0xffffffffa02c0977
[46925.374954] Filesystem "dm-0": Corruption of in-memory data detected.
 Shutti
ng down filesystem: dm-0
[46925.376874] Please umount the filesystem, and rectify the problem(s)
[46934.390143] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47112.408317] Pid: 15211, comm: umount Tainted: G      D
2.6.26-2-amd64 #1
[47112.408317]
[47112.408317] Call Trace:
[47112.408317]  [<ffffffff80234a20>] warn_on_slowpath+0x51/0x7a
[47112.408317]  [<ffffffff802b6c5b>] __mark_inode_dirty+0xe0/0x179
[47112.408317]  [<ffffffff802460ef>] bit_waitqueue+0x10/0x97
[47112.424115]  [<ffffffff802461b4>] wake_up_bit+0x11/0x22
[47112.424196]  [<ffffffff802b6364>] __writeback_single_inode+0x44/0x29d
[47112.424280]  [<ffffffff802b6928>] sync_sb_inodes+0x1b1/0x293
[47112.424362]  [<ffffffff802b6aa4>] sync_inodes_sb+0x9a/0xa6
[47112.424445]  [<ffffffff8029c6ed>] __fsync_super+0xb/0x6f
[47112.424527]  [<ffffffff8029c75a>] fsync_super+0x9/0x16
[47112.424608]  [<ffffffff8029c976>] generic_shutdown_super+0x21/0xee
[47112.424692]  [<ffffffff8029ca50>] kill_block_super+0xd/0x1e
[47112.424773]  [<ffffffff8029cb0c>] deactivate_super+0x5f/0x78
[47112.424855]  [<ffffffff802afe06>] sys_umount+0x2f9/0x353
[47112.424938]  [<ffffffff80221fac>] do_page_fault+0x5d8/0x9c8
[47112.428111]  [<ffffffff8029e0e4>] sys_newstat+0x19/0x31
[47112.428111]  [<ffffffff8031dc73>] __up_write+0x21/0x10e
[47112.428111]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f
[47112.428111]
[47112.428111] ---[ end trace ba717a82a77cfd6a ]---
[47112.428111] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47112.428111] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47112.428111] xfs_force_shutdown(dm-0,0x1) called from line 420 of file
fs/xfs/
xfs_rw.c.  Return address = 0xffffffffa02c8d33
[47112.428111] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47112.428111] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47112.428111] xfs_force_shutdown(dm-0,0x1) called from line 420 of file
fs/xfs/
xfs_rw.c.  Return address = 0xffffffffa02c8d33
[47112.428177] ------------[ cut here ]------------
[47112.428246] WARNING: at fs/fs-writeback.c:381
__writeback_single_inode+0x44/0
x29d()
[47112.428345] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler
ipv6 xfs
ext2 mbcache loop snd_pcm snd_timer snd soundcore snd_page_alloc
rng_core psmous
e i5000_edac iTCO_wdt button pcspkr serio_raw edac_core shpchp
pci_hotplug dcdba
s evdev reiserfs dm_mirror dm_log dm_snapshot dm_mod raid1 md_mod sg
sr_mod cdro
m ide_pci_generic ide_core ses enclosure ata_piix sd_mod e1000e
megaraid_sas bnx
2 firmware_class ata_generic libata dock uhci_hcd ehci_hcd mptsas
mptscsih mptba
se scsi_transport_sas scsi_mod thermal processor fan thermal_sys
[47112.432208] Pid: 15211, comm: umount Tainted: G      D W
2.6.26-2-amd64 #1
[47112.432294]
[47112.432304] Call Trace:
[47112.432443]  [<ffffffff80234a20>] warn_on_slowpath+0x51/0x7a
[47112.432528]  [<ffffffff80278daa>] pagevec_lookup_tag+0x1a/0x21
[47112.432614]  [<ffffffff80271846>] wait_on_page_writeback_range+0xc8/0x113
[47112.432709]  [<ffffffff802b6c5b>] __mark_inode_dirty+0xe0/0x179
[47112.432794]  [<ffffffff802460ef>] bit_waitqueue+0x10/0x97
[47112.432876]  [<ffffffff802461b4>] wake_up_bit+0x11/0x22
[47112.432959]  [<ffffffff802b6364>] __writeback_single_inode+0x44/0x29d
[47112.433073]  [<ffffffffa02c8d33>] :xfs:xfs_bwrite+0xb0/0xbb
[47112.433161]  [<ffffffffa02b5993>] :xfs:xfs_log_need_covered+0x15/0x8c
[47112.433242]  [<ffffffff802b6928>] sync_sb_inodes+0x1b1/0x293
[47112.433328]  [<ffffffff802b6aa4>] sync_inodes_sb+0x9a/0xa6
[47112.433411]  [<ffffffff8029c75a>] fsync_super+0x9/0x16
[47112.433493]  [<ffffffff8029c976>] generic_shutdown_super+0x21/0xee
[47112.433577]  [<ffffffff8029ca50>] kill_block_super+0xd/0x1e
[47112.433661]  [<ffffffff8029cb0c>] deactivate_super+0x5f/0x78
[47112.433743]  [<ffffffff802afe06>] sys_umount+0x2f9/0x353
[47112.433825]  [<ffffffff80221fac>] do_page_fault+0x5d8/0x9c8
[47112.433908]  [<ffffffff8029e0e4>] sys_newstat+0x19/0x31
[47112.433994]  [<ffffffff8031dc73>] __up_write+0x21/0x10e
[47112.434078]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f
[47112.434147]
[47112.434147] ---[ end trace ba717a82a77cfd6a ]---
[47113.504506] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47113.504506] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47113.504506] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47113.504506] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47113.506718] Filesystem "dm-0": xfs_log_force: error 5 returned.
[47113.516457] VFS: Busy inodes after unmount of dm-0. Self-destruct in
5 second
s.  Have a nice day...


I've found some similar issues but no solution :(

here is my

$omreport storage vdisk controller=0

output:

List of Virtual Disks on Controller PERC 5/E Adapter (Slot 1)

Controller PERC 5/E Adapter (Slot 1)
ID                  : 0
Status              : Ok
Name                : Array0
State               : Ready
Progress            : Not Applicable
Layout              : RAID-5
Size                : 1,953.12 GB (2097149902848 bytes)
Device Name         : /dev/sdc
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Back
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled

ID                  : 1
Status              : Ok
Name                : Array1
State               : Ready
Progress            : Not Applicable
Layout              : RAID-5
Size                : 1,951.13 GB (2095006613504 bytes)
Device Name         : /dev/sdd
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Back
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled

ID                  : 2
Status              : Ok
Name                : Array2
State               : Ready
Progress            : Not Applicable
Layout              : RAID-5
Size                : 1,953.12 GB (2097151737856 bytes)
Device Name         : /dev/sde
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Back
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled

ID                  : 3
Status              : Ok
Name                : Array3
State               : Ready
Progress            : Not Applicable
Layout              : RAID-5
Size                : 1,953.12 GB (2097151737856 bytes)
Device Name         : /dev/sdf
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Back
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled

ID                  : 4
Status              : Ok
Name                : Array4
State               : Ready
Progress            : Not Applicable
Layout              : RAID-5
Size                : 1,953.12 GB (2097151737856 bytes)
Device Name         : /dev/sdg
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Back
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled

ID                  : 5
Status              : Ok
Name                : Array5
State               : Ready
Progress            : Not Applicable
Layout              : RAID-5
Size                : 1,957.88 GB (2102253060096 bytes)
Device Name         : /dev/sdh
Type                : SAS
Read Policy         : Adaptive Read Ahead
Write Policy        : Write Back
Cache Policy        : Not Applicable
Stripe Element Size : 64 KB
Disk Cache Policy   : Disabled

I was thinking... maybe XFS on LVM2 requires some specific PERC VD
setup? I had same issue with gentoo 32-bit with 2.6.25 kernel and with
one MD1000. But the problem happend once a month. Now it's getting
worse. last 24 hours - 2 crashes :(

Any ideas?

Best regards,
Tomek Kruszona

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

#307676

Author: Andrew Reid
Date: Wed, 10 Jun 2009 22:00

39 lines
1230 bytes

On Tuesday 09 June 2009 08:16:35 Tomek Kruszona wrote:
> Hello!
>
> I have a problem with system in the configuration described in subject
> (Dell PE1950 III + PERC 5/E + 2xMD1000)
>
> System is running Debian Lenny AMD64 version with all available updates.
>
> I have 6 VD's 2TB each (for 32bit system compatibility). Each VD is a
> LVM2 PV
>
> I made a LVM2 volume and formatted this into XFS. Previously it was only
> one MD1000 connected to PERC controller.
>
> But two days ago i added next MD1000 added new PV's to LVM2 and extended
> XFS with xfs_growfs.
>
> After some time I got kernel panic like this:

  This strongly resembles an issue I had on a file server --
I don't have my notes handy, but it had to do with an issue
in which the kernel was interacting badly with a particular
motherboard chipset.

  The workaround was to reboot with the "iommu=soft" option
passed to the kernel.

  My problem was with an "etch" kernel, and it was my understanding
that newer kernels were not expected to have this problem, so
I may be off-base, but that's my experience.

  It sounds like this is at least an easy thing to try -- I really
wish I could find my notes...

				-- A.

--
Andrew Reid / reidac@bellatlantic.net

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

#307677

Author: Kelly Harding
Date: Thu, 11 Jun 2009 03:17

28 lines
921 bytes

2009/6/9 Tomek Kruszona <bloodyscarion@gmail.com>:
> Hello!
>
> I have a problem with system in the configuration described in subject
> (Dell PE1950 III + PERC 5/E + 2xMD1000)
>
> System is running Debian Lenny AMD64 version with all available updates.
>
> I have 6 VD's 2TB each (for 32bit system compatibility). Each VD is a
> LVM2 PV
>
> I made a LVM2 volume and formatted this into XFS. Previously it was only
> one MD1000 connected to PERC controller.
>
> But two days ago i added next MD1000 added new PV's to LVM2 and extended
> XFS with xfs_growfs.
>

Might be a bit of an obvious thing, but have you tried running memtest
to rule out dodgy memory?
usually when I see anything similar to this I run a memtest to be sure
(on a few occasions it has proven to be the memory.

Could also be a driver bug related to multiple MD1000s? no experience
with Dell perc hardware sadly though to be any further help.

Kelly

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

#307705

Author: Tomek Kruszona
Date: Thu, 11 Jun 2009 18:24

17 lines
670 bytes

Andrew Reid wrote:
>   This strongly resembles an issue I had on a file server --
> I don't have my notes handy, but it had to do with an issue
> in which the kernel was interacting badly with a particular
> motherboard chipset.
>
>   The workaround was to reboot with the "iommu=soft" option
> passed to the kernel.
>
>   My problem was with an "etch" kernel, and it was my understanding
> that newer kernels were not expected to have this problem, so
> I may be off-base, but that's my experience.
>
>   It sounds like this is at least an easy thing to try -- I really
> wish I could find my notes...
I'll try this option. I just need to wait for next crash ;)

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

#307706

Author: Tomek Kruszona
Date: Thu, 11 Jun 2009 18:25

17 lines
611 bytes

Kelly Harding wrote:
> Might be a bit of an obvious thing, but have you tried running memtest
> to rule out dodgy memory?
> usually when I see anything similar to this I run a memtest to be sure
> (on a few occasions it has proven to be the memory.
Memory is ok. Memtest passed. Moreover it's happening on more then one
machine.

>
> Could also be a driver bug related to multiple MD1000s? no experience
> with Dell perc hardware sadly though to be any further help.

I don't think so. I had this issue before when I had one MD1000
connected to PERC. I also happens with LSI 8880EM2 controller.

Best regards

🚀 go-pugleaf

Thread View: gmane.linux.debian.user

XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

Re: XFS frequent crashes on PE1950 with perc 5/e and 2xMD1000

Thread Navigation