diff --git a/en/projects/bigdisk/Makefile b/en/projects/bigdisk/Makefile new file mode 100644 index 0000000000..2a29ffa58a --- /dev/null +++ b/en/projects/bigdisk/Makefile @@ -0,0 +1,17 @@ +# Summary of work needed to support large disks and arrays. +# +# $FreeBSD$ + +MAINTAINER= scottl + +.if exists(../Makefile.conf) +.include "../Makefile.conf" +.endif +.if exists(../Makefile.inc) +.include "../Makefile.inc" +.endif + +DOCS= index.sgml +DATA= style.css + +.include "${WEB_PREFIX}/share/mk/web.site.mk" diff --git a/en/projects/bigdisk/index.sgml b/en/projects/bigdisk/index.sgml new file mode 100644 index 0000000000..0b0182f0e7 --- /dev/null +++ b/en/projects/bigdisk/index.sgml @@ -0,0 +1,216 @@ + + + + %includes; + + +N/A"> +Done"> +In progress"> +Needs testing"> +Not done"> +Unknown"> + + + + %developers; + +]> + + + &header; + +

Contents

+ + + +

Purpose and background

+

The UFS filesystem

+

When the UFS filesystem was introduced to BSD in 1982, its use of 32 + bit offsets and counters to address the storage was considered to be + ahead of its time. Since most fixed-disk storage devices use 512 byte + sectors, 32 bits allowed for 2 Terabytes of storage. That was an almost + un-imaginable quantity for the time. But now that 250 and 400 Gigabyte + disks are available at consumer prices, it's trivial to build a hardware + or software based storage array that can exceed 2TB for a few thousand + dollars.

+ +

The UFS2 filesystem was introduced in 2003 as a replacement to the + original UFS and provides 64 bit counters and offsets. This allows for + files and filesystems to grow to 2^73 bytes (2^64 * 512) in size and + hopefully be sufficient from quite a long time. UFS2 largely solved + the storage size limits imposed by the filesystem. Unfortunately, many + tools and storage mechanisms still use or assume 32 bit values, often + keeping FreeBSD limited to 2TB.

+ +

We need to ensure that FreeBSD supports large storage sizes and that + the benefits of UFS2 can actually be realized so that FreeBSD can remain + relevant in the enterprise world. This page describes known issues and + limits and provides a focus for further auditing, validation, and + fixing.

+ +

Limits on disk partitioning

+

The first limit that is encountered is in disk partitioning. For x86 + and amd64 PC's, the FDISK MBR table is used by the BIOS to partition the + disk into logical extents and identify which partition ('slice' in FreeBSD + terms) to boot off of. The MBR is defined to use 32 bit disk offsets, + and since it's an industry standard and interoperability is required, + there is nothing that can be done to change this. As long as booting a + PC requires the MBR, the boot slice in FreeBSD is going to be limited to + 2TB.

+ +

The GPT partitioning scheme was introduced with the ia64 architecture + as an MBR replacement. It provides 64 bit offsets and allows for an + arbitrary number of partitions. It also provides a compatibility mode + with MBR where it can generate an MBR-compatible structure on the disk + for use with systems that don't understand GPT. However, to get the + full benefits for boot storage, the BIOS and the FreeBSD loader must + understand it. For secondary storage, GPT can be used by any + architecture regardless of BIOS or boot support.

+ +

Many systems don't require an MBR or GPT, and even PCs don't require it + if booting and inter-operating with other OS's is not required. The next + limit that comes in, though, is with the BSD disklabel. This label + defines up to 8 partitions on a disk, MBR slice, or other storage extent + for filesystems and swap space. Unfortunately, the on-disk format of the + disk label again uses 32 bit quantities, so it is also limited to 2TB. + Fixing this would require creating a new format that is incompatible + with the old and would require an update to the FreeBSD boot loader. + This would complicate interoperability and the upgrade path. Also, if a + new format is going to be created, it should also address the 8 partition + limit that exists now. Given these requirements, it's tempting to just + adopt the GPT format instead for secondary storage partitioning.

+ + +

Testing large capacities

+

Even though large drives are cheap, it still isn't always feasible or + economical to test on real hardware. Swap-backed memory disks, via the + md(4) driver, can provide a good substitute for some of the testing. + Backing with swap means that only the pages that are dirtied by data + are actually allocated, so a multi-terabyte storage can be simulated + with a minimal of physical RAM+swap. Note that this is less true with + UFS1 since it will initialize all of the inode blocks during newfs, + which will dirty quite a bit of data. But for UFS2, swap-backed md + has the potential for working well. Unfortunately, the kernel md driver + has a number of 32-bit size limits of its own that need to be fixed. + Details are provided below.

+ +

It is still possible to avoid disklabels and MBRs for testing by + using newfs directly on the raw disk or md disk. Sysinstall can be + tested from a running system by just selecting Expert mode and just + performing the MBR and disklabel steps. Beware that sysinstall might + have other bugs that will wipe out your existing system, so care must + be taken here!

+ + +

Userland Tool Status

+ +

The following userland tools need auditing and testing for 64-bit + cleanliness:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Task Responsible Last updated Status Details
newfs_ffs  &status.new;A quick audit of newfs shows that the '-s' option uses atoi() + instead of strtoull() or equivalent. A more thorough audit is needed + to see if other integer limits exist.
df  &status.new;An audit is needed to make sure that all reported fields are + 64-bit clean. There are reports with certain fields being incorrect + or negative with NFS volumes, which could either be an NFS or df + problem.
du  &status.new;An audit is needed to make sure that all reported fields are + 64-bit clean.
growfs&a.scottl;12 Sept 2004&status.wip;Growfs has problems with expanding to new cylinder groups. It also + initializes UFS2 inode blocks instead of leaving them for lazy + initialization. It also needs a 64-bit audit.
sysinstall  &status.new;A full audit is needed. Reports exist of problems with >1TB + partitions.
fsck_ffs  &status.new;A full audit is needed.
+ + +

Kernel Driver Status

+ +

Many storage peripherals simply are not designed to handle >2TB + capacities. For those that are, an audit should be done to verify + that their drivers handle the sizes correctly and pass those sizes + correctly to the test of the kernel.

+ + + + + + + + + + + + + + + + +
Task Responsible Last updated Status Details
md&a.scottl;12 Sept 2004&status.wip;A number of sizes and offsets are tracked using the 'unsigned' + data type, so it appears that it cannot comprehend sizes greater + than 2TB with a 512 byte sector size. For the swap-backed module, + the page counter is also stored as a 32-bit quantity which also + might be a limiting factor.
+ + &footer; + + diff --git a/en/projects/bigdisk/style.css b/en/projects/bigdisk/style.css new file mode 100644 index 0000000000..beecc6f17a --- /dev/null +++ b/en/projects/bigdisk/style.css @@ -0,0 +1,38 @@ +BODY { +} + +BODY TD { + font-size: 13px; +} + +BODY SMALL { + width: 615px; + font-size: 11px; +} + +.heading { + font-size: 15px; + background-color: #cbd2ec; +} + +.section { + font-size: 15px; + font-weight: bold; + background-color: #e7e9f7; +} + +.notes { + font-size: 13px; + font-weight: normal; +} + +.main { + width: 615px; + height: auto; + text-align: justify; +} + +.list { + width: 550px; + height: auto; +}