Over time, storage devices can acquire uncorrectable media errors often called “bad blocks”. A bad block is a part of a storage media that is either inaccessible or unwritable due to a permanent physical damage. In case of memory mapped I/O, if a process tries to access (read or write) the corrupted block, it will be terminated by the SIGBUS signal.
PMDK libraries can handle bad blocks if the CHECK_BAD_BLOCKS
compat feature
is turned on. Currently (PMDK v1.5) it is disabled by default
because it requires super user privileges. It can be turned on using
pmempool-feature.
If the CHECK_BAD_BLOCKS
compat feature is turned on, several features
are available:
pmempool-info
prints out information about bad blocks in the poolpmempool-check
checks if the pool contains bad blockspmempool-sync
tries to fix bad blocks in libpmemobj pool using its replicasUsing pmempool-feature
one can enable or disable the CHECK_BAD_BLOCKS
compat feature:
$ pmempool feature --enable CHECK_BAD_BLOCKS ./poolset.file
$ pmempool feature --disable CHECK_BAD_BLOCKS ./poolset.file
The CHECK_BAD_BLOCKS
compat feature enables checking and fixing bad blocks.
Currently (Linux kernel v4.19, libndctl v62) these operations require
read access to the following resource files (containing physical addresses)
of NVDIMM devices which only the super user can read by default:
/sys/bus/nd/devices/ndbus*/region*/resource
/sys/bus/nd/devices/ndbus*/region*/dax*/resource
/sys/bus/nd/devices/ndbus*/region*/pfn*/resource
/sys/bus/nd/devices/ndbus*/region*/namespace*/resource
If the CHECK_BAD_BLOCKS
compat feature is enabled, then the pool is checked
if it contains bad blocks during opening and during creating
(when the pool is created using an already existing zeroed file(s)).
If it does then opening/creating fails.
If the CHECK_BAD_BLOCKS
compat feature is enabled and the user does not have
enough permissions (see pmempool-feature) to be able
to check if the pool contains bad blocks then opening/creating fails either.
If the CHECK_BAD_BLOCKS
compat feature is enabled or --bad-blocks=yes
option is used then pmempool-info
prints out information about bad blocks
in the pool, for example:
$ pmempool info --bad-blocks=yes ./poolset.file
Poolset structure:
Number of replicas : 1
Replica 0 (master) - local, 1 part(s):
part 0:
path : /dev/dax1.0
type : device dax
size : 62922752
alignment : 4096
bad blocks:
offset length
11 1
[...]
If the CHECK_BAD_BLOCKS
compat feature is enabled, pmempool-check
checks
if the pool contains bad blocks, for example:
$ pmempool check ./poolset.file
poolset contains bad blocks, use 'pmempool info --bad-blocks=yes' to print or 'pmempool sync --bad-blocks' to clear them
./poolset.file: cannot repair
Attention: this feature is available only for libpmemobj pools.
If the CHECK_BAD_BLOCKS
compat feature is enabled or --bad-blocks
option is used then pmempool-sync
tries to fix bad blocks in the libpmemobj
pool using its replicas:
$ pmempool sync --bad-blocks ./poolset.file
./poolset.file: synchronized
Synchronization can fail if a part of the pool has uncorrectable errors in all replicas:
$ pmempool sync --bad-blocks ./poolset.file
error: failed to synchronize: a part of the pool has uncorrectable errors in all replicas
error: Invalid argument
Fixing bad blocks causes creating or reading special recovery files. When bad blocks are detected, special recovery files have to be created in order to fix them safely. A separate recovery file is created per each part of the pool. The recovery files are created in the same directory as the poolset file, using the following name pattern:
<poolset-file-name>_r<replica-number>_p<part-number>_badblocks.txt
for example:
poolset.file_r0_p0_badblocks.txt
for part #0 of replica #0. These recovery files are automatically removed if the sync operation finishes successfully.
If the last sync operation was interrupted and not finished correctly
(eg. pmempool crashed) and the bad blocks fixing procedure was
in progress, the bad block recovery files may be left over. In such case
bad blocks might have been cleared and zeroed, but the correct data from these
blocks was not recovered (not copied from a healthy replica), so the recovery
files MUST NOT be deleted manually, because it would cause a data loss.
If bad block recovery files are present, opening a pool will always fail.
In such case pmempool-sync
should be run again with the --bad-blocks
option.
It will finish the previously interrupted sync operation and copy correct data
to zeroed bad blocks using the left-over bad block recovery files
(the bad blocks will be read from the saved recovery files). Pmempool will
delete the recovery files automatically at the end of the sync operation.