diff --git a/en/projects/bigdisk/index.sgml b/en/projects/bigdisk/index.sgml index 308ef61d58..c0211d9c70 100644 --- a/en/projects/bigdisk/index.sgml +++ b/en/projects/bigdisk/index.sgml @@ -1,285 +1,285 @@ - + %navincludes; %includes; N/A"> Done"> In progress"> Needs testing"> Not done"> Unknown"> %developers; ]> &header;

Contents

Purpose and background

The UFS filesystem

When the UFS filesystem was introduced to BSD in 1982, its use of 32 bit offsets and counters to address the storage was considered to be ahead of its time. Since most fixed-disk storage devices use 512 byte sectors, 32 bits allowed for 2 Terabytes of storage. That was an almost un-imaginable quantity for the time. But now that 250 and 400 Gigabyte disks are available at consumer prices, it's trivial to build a hardware or software based storage array that can exceed 2TB for a few thousand dollars.

The UFS2 filesystem was introduced in 2003 as a replacement to the original UFS and provides 64 bit counters and offsets. This allows for files and filesystems to grow to 2^73 bytes (2^64 * 512) in size and hopefully be sufficient for quite a long time. UFS2 largely solved the storage size limits imposed by the filesystem. Unfortunately, many tools and storage mechanisms still use or assume 32 bit values, often keeping FreeBSD limited to 2TB.

We need to ensure that FreeBSD supports large storage sizes and that the benefits of UFS2 can actually be realized so that FreeBSD can remain relevant in the enterprise world. This page describes known issues and limits and provides a focus for further auditing, validation, and fixing.

Limits on disk partitioning

The first limit that is encountered is in disk partitioning. For x86 and amd64 PC's, the FDISK MBR table is used by the BIOS to partition the disk into logical extents and identify which partition ('slice' in FreeBSD terms) to boot from. The MBR is defined to use 32 bit disk offsets, and since it's an industry standard and interoperability is required, there is nothing that can be done to change this. As long as booting a PC requires the MBR, the boot slice in FreeBSD is going to be limited to 2TB.

The GPT partitioning scheme was introduced with the ia64 architecture as an MBR replacement. It provides 64 bit offsets and allows for an arbitrary number of partitions. It also provides a compatibility mode with MBR where it can generate an MBR-compatible structure on the disk for use with systems that don't understand GPT. However, to get the full benefits for boot storage, the BIOS and the FreeBSD loader must understand it. For secondary storage, GPT can be used by any architecture regardless of BIOS or boot support.

Many systems don't require an MBR or GPT, and even PCs don't require it if booting and inter-operating with other OS's is not required. The next limit that comes in, though, is with the BSD disklabel. This label defines up to 8 partitions on a disk, MBR slice, or other storage extent for filesystems and swap space. Unfortunately, the on-disk format of the disk label again uses 32 bit quantities, so it is also limited to 2TB. Fixing this would require creating a new format that is incompatible with the old and would require an update to the FreeBSD boot loader. This would complicate interoperability and the upgrade path. Also, if a new format is going to be created, it should also address the 8 partition limit that exists now. Given these requirements, it's tempting to just adopt the GPT format instead for secondary storage partitioning.

Testing large capacities

Even though large drives are cheap, it still isn't always feasible or economical to test on real hardware. Swap-backed memory disks, via the md(4) driver, can provide a good substitute for some of the testing. Backing with swap means that only the pages that are dirtied by data are actually allocated, so a multi-terabyte storage can be simulated with a minimal amount of physical RAM+swap. Note that this is less true with UFS1 since it will initialize all of the inode blocks during newfs, which will dirty quite a bit of data. But for UFS2, swap-backed md has the potential for working well. Unfortunately, the kernel md driver has a number of 32-bit size limits of its own that need to be fixed. Details are provided below.

It is still possible to avoid disklabels and MBRs for testing by using newfs directly on the raw disk or md disk. Sysinstall can be tested from a running system by just selecting Expert mode and just performing the MBR and disklabel steps. Beware that sysinstall might have other bugs that will wipe out your existing system, so care must be taken here!

Userland Tool Status

The following userland tools need auditing and testing for 64-bit cleanliness:

- +
Task Responsible Last updated Status Details
newfs &a.pjd; 19 Sept 2004 &status.done; Handling of '-s' option was fixed. Newfs should be now fully usable for large file systems.
df     &status.new; An audit is needed to make sure that all reported fields are 64-bit clean. There are reports with certain fields being incorrect or negative with NFS volumes, which could either be an NFS or df problem.
du &a.pjd; 7 Jan 2005 &status.done; Big files/directories handling was broken. It was fixed and du should be now fully usable on large file systems with large files/directories.
growfs&a.scottl;  12 Sept 2004 &status.wip; Growfs has problems with expanding to new cylinder groups. It also initializes UFS2 inode blocks instead of leaving them for lazy initialization. It also needs a 64-bit audit.
sysinstall     &status.new; A full audit is needed. Reports exist of problems with >1TB partitions.
fsck_ffs &a.pb; 15 Jan 2005 &status.wip; A full audit is needed. At least some printf format changes are necessary.
dump/restore     &status.new; A full audit is needed. At least some printf format changes are necessary in dump(8).
fsdb     &status.new; A full audit is needed. At least some printf format changes are necessary.
quota tools     &status.new; Extensive changes are need. Disk quotas are currently handled as 32-bit quantities, which limits the maximum possible quota at 2TB. Two tasks are needed: 1) have the current tools (kernel+userland, edquota for example) fail gracefully when presented with 64-bit quantities and 2) extend the quota file format and tools to 64-bit while providing a compatibility mode and/or migration tools.

Kernel Driver Status

Many storage peripherals simply are not designed to handle >2TB capacities. For those that are, an audit should be done to verify that their drivers handle the sizes correctly and pass those sizes correctly to the rest of the kernel.

Task Responsible Last updated Status Details
md &a.pjd; 17 Sept 2004 &status.done; Swap backed disks can now be created up to 16TB in size on i386. This corresponds to 2^32*4096.

Subsystem Status

Some filesystem-related subsystems require testing with >2TB volumes, or need to be adapted. The following areas have been identified:

Task Responsible Last updated Status Details
snapshots &a.pb; 15 Jan 2004 &status.wip; Taking snapshots fails on filesystems >2TB, returning EFBIG (on a 5TB filesystem) and subsequently crashing the system in softupdates.
quotas     &status.new; The quota subsystem handles 32-bit quantities, which limits quotas to 2TB. Blockings of the syncer have been observed while attempting to set quotas over that limit (try 4000000000 KBytes as a hard limit in edquota(8) for some uid, then create somes files owned by that uid). See also the userland entry for quota tools.
&footer; diff --git a/en/projects/busdma/index.sgml b/en/projects/busdma/index.sgml index 20cc7ea714..9a04d997de 100644 --- a/en/projects/busdma/index.sgml +++ b/en/projects/busdma/index.sgml @@ -1,1296 +1,1296 @@ - + %navincludes; %includes; N/A"> Done"> In progress"> Needs testing"> Not done"> Unknown"> %developers; ]> &header;

Contents

Project Goal

busdma

The busdma interfaces permit hardware device drivers to operate on a variety of platforms avoiding the encoding of platform-specific access methods into drivers. This lowers the maintenance costs for drivers across platforms, and improves the chances that a driver will "just work" on a new platform. Modifying a driver to make use of busdma is relatively straight forward, but does require familiarity with both the device driver and busdma primitives. For busdma to be used in FreeBSD, two sets of changes are generally required: adaptation of the busdma implementation to run on all platforms, and adaptation of drivers to use the framework. As such, status information on this project is broken down into platform support, and driver support (sorted by category). Completing this work requires a thorough audit of the system device drivers, then prioritized conversion of drivers. Drivers are also expected to use bus_space functions, and this column is sometimes used to denote a driver in need of conversion to bus_space as well.

INTR_MPSAFE

Hardware drivers register their interrupt handler with the bus_setup_intr() function. Setting the flag INTR_MPSAFE tells the system interrupt code to call the interrupt routine without holding the Giant mutex. This can give a significant performance gain on SMP systems.

Drivers can set this flag even if they are not fully locked down as long as their interrupt routine is careful about not touching other data structures in the driver. An easy way to do this is to check and clear the hardware interrupt status registers and then schedule the interrupt processing for a taskqueue or kernel thread.

SMPng locked

Drivers should employ mutexes and sx locks to protect their data structures and hardware registers from competing threads. Mutex operations are somewhat expensive, so a good strategy is combine as many atomic operations into a single mutex acquisition as possible.

p!=a safety

Intel PAE support requires that pointers and physical address representations be of differing sizes. This means that drivers must be written to use vm_paddr_t or bus_addr_t rather than assuming that physical addresses can be represented using a void *. In addition, format strings and casts must be carefully handled.

The task list below is not intended to be complete, but does represent a set of relevant and/or important components of the overall work. The "Responsible" field identifies a developer who has expressed willingness to be responsible for completing the identified task; this doesn't preclude others working on it, but suggests that coordination with the responsible party might be appropriate so as to avoid unnecessary duplication of work, and to maximize forward progress. If beginning work on a new area of substantial size, or one that appears unclaimed, it may be worth dropping an e-mail to &a.mux; to see if any progress has been made.

The definition of the date field varies depending on the status of a task. For completed tasks, it refers to the date completed or reported completed. For in-progress tasks, it refers to the date of the last update of the entry. For stalled tasks, it refers to the date that the task was declared stalled. For new tasks, it refers to the date the task was added to the list.

Tasks are sorted first by status, then by date.

Resources and Links

A series of manual pages related to this project can be found here:

Platform Support Status

Task Responsible Last updated Status Details
alpha &a.ticso; November 14, 2005 &status.done; There are problems for systems with large amounts of memory.
amd64 &a.peter; July 1, 2003 &status.done; Fully supported.
arm &a.cognet; December 23, 2005 &status.done; Fully supported.
ia64 &a.marcel; December 10, 2002 &status.done; There may be problems for systems with large amounts of memory.
i386 &a.sam; December 9, 2002 &status.done; Fully supported.
powerpc &a.grehan; January 15, 2003 &status.done; Fully supported.
sparc64 &a.tmm; January 6, 2003 &status.done; Fully supported.

Network Interface Driver Status

Driver Responsible Last updated busdma INTR_MPSAFE SMPng locked a!=p Notes
if_an   December 23, 2005 &status.unknown; &status.unknown; &status.unknown; &status.unknown;  
if_ar     &status.new; &status.new; &status.new; &status.new; kvtop()
if_bge &a.wpaul; April 13, 2004 &status.done; &status.done; &status.done; &status.done;  
if_cp &a.rik; October 31, 2005 &status.done; &status.done; &status.done; &status.new;  
if_cs &a.imp; December 23, 2005 &status.new; &status.new; &status.new; &status.unknown; Needs bus_space conversion
if_ct &a.rik; October 31, 2005 &status.done; &status.done; &status.done; &status.new;  
if_cx &a.rik; June 24, 2004 &status.done; &status.wip; &status.wip; &status.new;  
if_dc &a.mux; August 19, 2005 &status.done; &status.done; &status.done; &status.done;  
if_de &a.mux; August 17, 2005 &status.done; &status.done; &status.done; &status.new;  
if_ed &a.imp; December 23, 2005 &status.done; &status.done; &status.done; &status.done;  
if_em &a.pdeuskar; April 13, 2004 &status.done; &status.done; &status.done; &status.done;  
if_en &a.harti; November 2, 2005 &status.done; &status.new; &status.new; &status.done; Locking present; not yet marked INTR_MPSAFE?
if_ep &a.mdodd;,&a.imp; April 13, 2004 &status.done; &status.done; &status.done; &status.done;  
if_ex &a.imp; Dcember 23, 2005 &status.done; &status.new; &status.new; &status.done;  
if_fatm &a.harti; November 2, 2005 &status.done; &status.done; &status.done; &status.done;  
if_fwe     &status.new; &status.new; &status.new; &status.new;  
if_fxp &a.mux; April 13, 2004 &status.done; &status.done; &status.done; &status.done;  
if_gem &a.tmm; July 31, 2005 &status.done; &status.done; &status.done; &status.new;  
if_hatm &a.harti; November 2, 2005 &status.done; &status.done; &status.done; &status.done;  
if_hme &a.tmm; January 30, 2005 &status.done; &status.done; &status.done; &status.done;  
if_idt     &status.new; &status.new; &status.new; &status.new; vtophys()
if_le &a.marius; January 31, 2006 &status.done; &status.done; &status.done; &status.done;  
if_lge   November 23, 2005 &status.new; &status.done; &status.done; &status.new; vtophys()
if_lmc   February 11, 2006 &status.done; &status.done; &status.done; &status.unknown; Untested on PAE
if_mn     &status.new; &status.new; &status.new; &status.new; vtophys(). Please contact &a.phk; for info/hardware.
if_my   August 17, 2005 &status.new; &status.done; &status.done; &status.new; vtophys()
if_nge   August 17, 2005 &status.new; &status.done; &status.done; &status.new; vtophys()
if_nve   November 23, 2005 &status.new; &status.done; &status.done; &status.new; vtophys()
if_pcn &a.obrien; August 19, 2005 &status.new; &status.done; &status.done; &status.new; vtophys()
if_pdq     &status.new; &status.new; &status.new; &status.new; Mostly busdma, except for vtophys().
if_re   May 30, 2005 &status.done; &status.done; &status.done; &status.done;  
if_rl &a.wpaul; April 13, 2004 &status.done; &status.done; &status.done; &status.new;  
if_sf   August 19, 2005 &status.new; &status.done; &status.done; &status.new; vtophys()
if_sis &a.wpaul; April 13, 2004 &status.done; &status.done; &status.done; &status.new;  
if_sk   April 27, 2006 &status.done; &status.done; &status.done; &status.new; vtophys()
if_sn &a.imp; December 23, 2005 &status.done; &status.done; &status.done; &status.done;  
if_snc   December 23, 2005 &status.unknown; &status.unknown; &status.unknown; &status.unknown; pc98 only device (although it could work with many cardbus bridges)
if_sr     &status.new; &status.new; &status.new; &status.new; vtophys()
if_ste   August 31, 2005 &status.new; &status.done; &status.done; &status.new; vtophys()
if_ti   December 13, 2005 &status.done; &status.done; &status.done; &status.done;  
if_tl   September 15, 2005 &status.new; &status.done; &status.done; &status.new;  
if_tx &a.mux; April 19, 2003 &status.done; &status.new; &status.new; &status.untested;  
if_txp   September 22, 2005 &status.wip; &status.done; &status.done; &status.new;  
if_vr   April 23, 2004 &status.new; &status.done; &status.done; &status.new;  
if_vx   September 22, 2005 &status.na; &status.done; &status.done; &status.new; Uses PIO to copy mbufs to and from hardware.
if_wb   September 22, 2005 &status.new; &status.done; &status.done; &status.new;  
if_wi &a.sam;, &a.imp; November 4, 2003 &status.unknown; &status.done; &status.unknown; &status.unknown; This driver needs lots of help
if_xe &a.imp; December 23, 2005 &status.done; &status.done; &status.done; &status.done;  
if_xl &a.mux; April 13, 2004 &status.done; &status.done; &status.done; &status.done;  

Storage Device Driver Status

- + - + - + - + - +
Driver Responsible Last updated busdma INTR_MPSAFE SMPng locked a!=p Notes
aac &a.scottl;   January 31, 2005 &status.done; &status.done; &status.done; &status.done; Not endian clean.
adv   December 9, 2002 &status.done; &status.new; &status.new; &status.new;  
aha   April 13, 2004 &status.done; &status.wip; &status.wip; &status.new; Uses BUSDMA, but may pun bus address with host address.
ahb   December 9, 2002 &status.done; &status.new; &status.new; &status.new;  
ahc &a.gibbs; January 31, 2005 &status.done; &status.new; &status.new; &status.done;  
ahd &a.gibbs; January 31, 2005 &status.done; &status.new; &status.new; &status.done;  
aic   December 23, 2005 &status.unknown; &status.unknown; &status.unknown; &status.unknown; Neeeds evaluation
amd   December 14, 2002 &status.done; &status.new; &status.new; &status.new;  
amr &a.scottl;   January 30, 2005 &status.done; &status.done; &status.done; &status.done;  
asr   January 4, 2003 &status.new; &status.new; &status.new; &status.new; vtophys(). Requires major work. A new I2O framework would be desirable.
ata &a.sos; December 9, 2002 &status.done; &status.done; &status.done; &status.done;  
buslogic     &status.new; &status.new; &status.new; &status.new; vtophys()
ciss   December 9, 2002 &status.done; &status.new; &status.new; &status.new;  
ct     &status.new; &status.new; &status.new; &status.new;  
dpt     &status.new; &status.new; &status.new; &status.new; vtophys()
fdc   December 23, 2005 &status.unknown; &status.unknown; &status.unknown; &status.unknown; Needs evaluation
ida   December 9, 2002 &status.done; &status.new; &status.new; &status.new;  
iir   March 1, 2006 &status.done; &status.done; &status.done; &status.done; 64-bit DMA without bouncing is possible, but needs work.
isp   February 8, 2003 &status.done; &status.done; &status.new; &status.new;  
ips &a.scottl;   January 30, 2005 &status.done; &status.done; &status.done; &status.done;  
mlx &a.scottl;   February 8, 2003 &status.done; &status.wip; &status.wip; &status.new;  
mly &a.scottl;   February 8, 2003 &status.done; &status.wip; &status.wip; &status.new;  
mpt   December 9, 2002 &status.done; &status.done; &status.new; &status.new;  
ncr     &status.new; &status.new; &status.new; &status.new; vtophys(). Please contact &a.phk; for a possible source of hardware.
ncv     &status.unknown; &status.unknown; &status.unknown; &status.unknown; Needs evaluation
nsp     &status.unknown; &status.unknown; &status.unknown; &status.unknown; Needs evaluation
pst   April 11, 2003 &status.new; &status.done; &status.new; &status.new; vtophys()
stg   December 9, 2002 &status.done; &status.new; &status.new; &status.new; At least, it looks like it may well be.
sym   December 19, 2002 &status.done; &status.new; &status.new; &status.new;  
trm &a.cognet; December 9, 2002 &status.done; &status.new; &status.new; &status.new;  
twe   December 9, 2002 &status.done; &status.new; &status.new; &status.new;  
wds   February 2, 2005 &status.done; &status.new; &status.new; &status.new;  

Miscellaneous Device Driver Status

Driver Responsible Last updated busdma INTR_MPSAFE SMPng locked a!=p Notes
agp   October 31, 2005 &status.new; &status.new; &status.new; &status.new; vtophys()
bktr &a.cognet; January 15, 2003 &status.wip; &status.new; &status.new; &status.new; vtophys()
digi     &status.new; &status.new; &status.new; &status.new; vtophys()
drm &a.anholt; October 27, 2003 &status.wip; &status.done; &status.done; &status.wip; vtophys(). The locking could use some review.
fb     &status.new; &status.new; &status.new; &status.new; vtophys()
firewire &a.simokawa; April 17, 2003 &status.done; &status.new; &status.new; &status.done; vtophys()
hfa     &status.new; &status.new; &status.new; &status.new; vtophys()
hifn &a.sam; April 13, 2004 &status.done; &status.done; &status.done; &status.new;  
musycc     &status.new; &status.new; &status.new; &status.new; vtophys(). Please contact &a.phk; for info/hardware.
pcm &a.cognet; February 20, 2003 &status.done; &status.done; &status.new; &status.new;  
ubsec &a.sam; April 13, 2004 &status.done; &status.done; &status.done; &status.new; vtophys() is used in debugging printf.
usb &a.jmg; July 24, 2003 &status.done; &status.new; &status.new; &status.untested; a!=p should be clean, but requires further testing.

Documentation Status

Task Responsible Last updated Status Notes
Manual pages for the busdma API &a.hmp; January 15, 2003 &status.done;  
&footer;