💾 Archived View for tanso.net › notes › gpfs-poor-mans-checksumming.gmi captured on 2020-11-07 at 07:44:25. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Spectrum Scale has support for end-to-end checksumming if you have an ESS, but not with non-ESS storage. As a poor man's checksumming, one can store file checksums in the extended attributes. Storing checksums here can be useful for validating if there has been bitrot at a later point, without leaving around .md5sum-files everywhere. Manually this can be done with the "mmchattr" command:
# md5sum testfile d2aa3e121b13274d84c8388f453b0ca5 testfile # mmchattr --set-attr user.cksum=md5:d2aa3e121b13274d84c8388f453b0ca5 testfile # mmlsattr --get-attr user.cksum testfile file name: testfile user.cksum: "md5:d2aa3e121b13274d84c8388f453b0ca5"
To efficiently set this attribute on all files, we can utilize the policy engine. First we create a script that parses a policy engine file listing, and apply the checksum attribute on every file listed:
# cat <<'EOF' > /root/apply-checksum.sh #! /bin/bash - # # Arguments to this script: # # $@ = LIST /mnt/gpfs01/.mmSharedTmpDir/mmPolicy.ix.19751.02F6A4F0.1 7 # # IFS=$(echo -en "\n\b") case $1 in TEST ) exit 0 ;; LIST ) awk -F ' -- ' '{print $2}' "$2" | while read file do MD5SUM=$(md5sum "$file") mmchattr --set-attr user.cksum=md5:"${MD5SUM% *}" "$file" done ;; esac EOF # chmod +x /root/apply-checksum.sh
Then we create an external list policy that call this script on all files that has not yet gotten the user.cksum attribute defined:
# cat <<'EOF' > apply-checksum.policy RULE EXTERNAL LIST 'setChecksum' EXEC '/root/apply-checksum.sh' RULE 'findNonChecksummed' LIST 'setChecksum' WHERE xattr('user.cksum') IS NULL EOF
And run it using:
# mmapplypolicy /path/to/folder -P apply-checksum.policy -I yes
Now this attribute can be used f.ex. in a migration policy, to make sure we only migrate out files to Spectrum Archive or TCT after a checksum has been defined for the files:
RULE 'archivePoolRule' EXTERNAL POOL 'ltfsee' EXEC '/opt/ibm/ltfsee/bin/eeadm' OPTS '-p pool1@library1,pool2@library2' RULE 'migrateOffline' MIGRATE TO POOL 'ltfsee' WHERE xattr('user.cksum') IS NOT NULL