The original post: /r/datahoarder by /u/OniHanz on 2024-12-22 10:11:15.

Hey everyone,

I’m working on a storage/backup project spanning two different sites, and I’d love your input or suggestions. Here’s a quick rundown:

Site 1 (Main Storage) • 8×20 TB in RAID6 (~110 TB usable). • Acts as the primary storage with redundancy for up to two disk failures.

Site 2 (Backup) • 6×16 TB + 2×8 TB (about ~112 TB total). • No RAID here; I’m planning to use MergerFS to unify the storage pool.

The Plan

1.  Automated Transfers: I want to use Ansible to automate data synchronization from Site 1 to Site 2. (Likely using rsync or rclone for incremental copies.)

2.  Data Indexing: Because Site 2 has no RAID, I’m setting up an index to track which disk holds which files. That way, if a disk fails, I’ll know exactly what data was on it.

What do you think about this approach? Any suggestions for improvement?

Thanks in advance for your insights!