diff --git a/en/handbook/internals/chapter.sgml b/en/handbook/internals/chapter.sgml
index 565af21b0b..df6042c920 100644
--- a/en/handbook/internals/chapter.sgml
+++ b/en/handbook/internals/chapter.sgml
@@ -1,1864 +1,1864 @@
FreeBSD Internals
The FreeBSD Booting Process
Contributed by &a.phk;. v1.1, April
26th.
Booting FreeBSD is essentially a three step process: load the
kernel, determine the root filesystem and initialize user-land things.
This leads to some interesting possibilities shown below.
Loading a kernel
We presently have three basic mechanisms for loading the kernel as
described below: they all pass some information to the kernel to help
the kernel decide what to do next.
Biosboot
Biosboot is our “bootblocks”. It consists of
two files which will be installed in the first 8Kbytes of the
floppy or hard-disk slice to be booted from.
Biosboot can load a kernel from a FreeBSD filesystem.
Dosboot
Dosboot was written by DI. Christian Gusenbauer, and is
unfortunately at this time one of the few pieces of code that
will not compile under FreeBSD itself because it is written for
Microsoft compilers.
Dosboot will boot the kernel from a MS-DOS file or from a
FreeBSD filesystem partition on the disk. It attempts to
negotiate with the various and strange kinds of memory manglers
that lurk in high memory on MS/DOS systems and usually wins them
for its case.
Netboot
Netboot will try to find a supported Ethernet card, and use
BOOTP, TFTP and NFS to find a kernel file to boot.
Determine the root filesystem
Once the kernel is loaded and the boot-code jumps to it, the
kernel will initialize itself, trying to determine what hardware is
present and so on; it then needs to find a root filesystem.
Presently we support the following types of root
filesystems:
UFS
This is the most normal type of root filesystem. It can
reside on a floppy or on hard disk.
MSDOS
While this is technically possible, it is not particular
useful because of the FAT filesystem's
inability to deal with links, device nodes and other such
“UNIXisms”.
MFS
This is actually a UFS filesystem which has been compiled
into the kernel. That means that the kernel does not really
need any hard disks, floppies or other hardware to
function.
CD9660
This is for using a CD-ROM as root filesystem.
NFS
This is for using a fileserver as root filesystem, basically
making it a diskless machine.
Initialize user-land things
To get the user-land going, the kernel, when it has finished
initialization, will create a process with pid == 1
and execute a program on the root filesystem; this program is normally
/sbin/init.
You can substitute any program for /sbin/init,
as long as you keep in mind that:
there is no stdin/out/err unless you open it yourself. If you
exit, the machine panics. Signal handling is special for pid
== 1.
An example of this is the /stand/sysinstall
program on the installation floppy.
Interesting combinations
Boot a kernel with a MFS in it with a special
/sbin/init which...
A — Using DOS
mounts your C: as
/C:
Attaches C:/freebsd.fs on
/dev/vn0
mounts /dev/vn0 as
/rootfs
makes symlinks
/rootfs/bin ->
/bin
/rootfs/etc ->
/etc
/rootfs/sbin ->
/sbin (etc...)
Now you are running FreeBSD without repartitioning your hard
disk...
B — Using NFS
NFS mounts your server:~you/FreeBSD as
/nfs, chroots to /nfs
and executes /sbin/init there
Now you are running FreeBSD diskless, even though you do not
control the NFS server...
C — Start an X-server
Now you have an X-terminal, which is better than that dingy
X-under-windows-so-slow-you-can-see-what-it-does thing that your
boss insist is better than forking out money on hardware.
D — Using a tape
Takes a copy of /dev/rwd0 and writes it
to a remote tape station or fileserver.
Now you finally get that backup you should have made a year
ago...
E — Acts as a firewall/web-server/what do I
know...
This is particularly interesting since you can boot from a
write- protected floppy, but still write to your root
filesystem...
PC Memory Utilization
Contributed by &a.joerg;. 16 Apr
1995.
A short description of how FreeBSD uses memory on the i386
platform
The boot sector will be loaded at 0:0x7c00, and
relocates itself immediately to 0x7c0:0. (This is
nothing magic, just an adjustment for the %cs
selector, done by an ljmp.)
It then loads the first 15 sectors at 0x10000
(segment BOOTSEG in the biosboot Makefile), and sets
up the stack to work below 0x1fff0. After this, it
jumps to the entry of boot2 within that code. I.e., it jumps over
itself and the (dummy) partition table, and it is going to adjust the
%cs selector—we are still in 16-bit mode there.
boot2 asks for the boot file, and examines the
a.out header. It masks the file entry point
(usually 0xf0100000) by
0x00ffffff, and loads the file there. Hence the
usual load point is 1 MB (0x00100000). During load,
the boot code toggles back and forth between real and protected mode, to
use the BIOS in real mode.
The boot code itself uses segment selectors 0x18
and 0x20 for %cs and
%ds/%es in protected mode, and
0x28 to jump back into real mode. The kernel is
finally started with %cs 0x08 and
%ds/%es/%ss 0x10, which refer to
dummy descriptors covering the entire address space.
The kernel will be started at its load point. Since it has been
linked for another (high) address, it will have to execute PIC until the
page table and page directory stuff is setup properly, at which point
paging will be enabled and the kernel will finally run at the address
for which it was linked.
Contributed by &a.dg;. 16 Apr
1995.
The physical pages immediately following the kernel BSS contain
proc0's page directory, page tables, and upages. Some time later when
the VM system is initialized, the physical memory between
0x1000-0x9ffff and the physical memory after the
kernel (text+data+bss+proc0 stuff+other misc) is made available in the
form of general VM pages and added to the global free page list.
DMA: What it Is and How it Works
Copyright © 1995,1997 &a.uhclem;, All Rights
Reserved. 10 December 1996. Last Update 8 October
1997.
Direct Memory Access (DMA) is a method of allowing data to be moved
from one location to another in a computer without intervention from the
central processor (CPU).
The way that the DMA function is implemented varies between computer
architectures, so this discussion will limit itself to the
implementation and workings of the DMA subsystem on the IBM Personal
Computer (PC), the IBM PC/AT and all of its successors and
clones.
The PC DMA subsystem is based on the Intel 8237 DMA controller. The
8237 contains four DMA channels that can be programmed independently and
any one of the channels may be active at any moment. These channels are
numbered 0, 1, 2 and 3. Starting with the PC/AT, IBM added a second
8237 chip, and numbered those channels 4, 5, 6 and 7.
The original DMA controller (0, 1, 2 and 3) moves one byte in each
transfer. The second DMA controller (4, 5, 6, and 7) moves 16-bits from
two adjacent memory locations in each transfer, with the first byte
always coming from an even-numbered address. The two controllers are
identical components and the difference in transfer size is caused by
the way the second controller is wired into the system.
The 8237 has two electrical signals for each channel, named DRQ and
-DACK. There are additional signals with the names HRQ (Hold Request),
HLDA (Hold Acknowledge), -EOP (End of Process), and the bus control
signals -MEMR (Memory Read), -MEMW (Memory Write), -IOR (I/O Read), and
-IOW (I/O Write).
The 8237 DMA is known as a “fly-by” DMA controller.
This means that the data being moved from one location to another does
not pass through the DMA chip and is not stored in the DMA chip.
Subsequently, the DMA can only transfer data between an I/O port and a
memory address, but not between two I/O ports or two memory
locations.
The 8237 does allow two channels to be connected together to allow
memory-to-memory DMA operations in a non-“fly-by” mode,
but nobody in the PC industry uses this scarce resource this way since
it is faster to move data between memory locations using the
CPU.
In the PC architecture, each DMA channel is normally activated only
when the hardware that uses a given DMA channel requests a transfer by
asserting the DRQ line for that channel.
A Sample DMA transfer
Here is an example of the steps that occur to cause and perform a
DMA transfer. In this example, the floppy disk controller (FDC) has
just read a byte from a diskette and wants the DMA to place it in
memory at location 0x00123456. The process begins by the FDC
asserting the DRQ2 signal (the DRQ line for DMA channel 2) to alert
the DMA controller.
The DMA controller will note that the DRQ2 signal is asserted. The
DMA controller will then make sure that DMA channel 2 has been
programmed and is unmasked (enabled). The DMA controller also makes
sure that none of the other DMA channels are active or want to be
active and have a higher priority. Once these checks are complete,
the DMA asks the CPU to release the bus so that the DMA may use the
bus. The DMA requests the bus by asserting the HRQ signal which goes
to the CPU.
The CPU detects the HRQ signal, and will complete executing the
current instruction. Once the processor has reached a state where it
can release the bus, it will. Now all of the signals normally
generated by the CPU (-MEMR, -MEMW, -IOR, -IOW and a few others) are
placed in a tri-stated condition (neither high or low) and then the
CPU asserts the HLDA signal which tells the DMA controller that it is
now in charge of the bus.
Depending on the processor, the CPU may be able to execute a few
additional instructions now that it no longer has the bus, but the CPU
will eventually have to wait when it reaches an instruction that must
read something from memory that is not in the internal processor cache
or pipeline.
Now that the DMA “is in charge”, the DMA activates its
-MEMR, -MEMW, -IOR, -IOW output signals, and the address outputs from
the DMA are set to 0x3456, which will be used to direct the byte that
is about to transferred to a specific memory location.
The DMA will then let the device that requested the DMA transfer
know that the transfer is commencing. This is done by asserting the
-DACK signal, or in the case of the floppy disk controller, -DACK2 is
asserted.
The floppy disk controller is now responsible for placing the byte
to be transferred on the bus Data lines. Unless the floppy controller
needs more time to get the data byte on the bus (and if the peripheral
does need more time it alerts the DMA via the READY signal), the DMA
will wait one DMA clock, and then de-assert the -MEMW and -IOR signals
so that the memory will latch and store the byte that was on the bus,
and the FDC will know that the byte has been transferred.
Since the DMA cycle only transfers a single byte at a time, the
FDC now drops the DRQ2 signal, so the DMA knows that it is no longer
needed. The DMA will de-assert the -DACK2 signal, so that the FDC
knows it must stop placing data on the bus.
The DMA will now check to see if any of the other DMA channels
have any work to do. If none of the channels have their DRQ lines
asserted, the DMA controller has completed its work and will now
tri-state the -MEMR, -MEMW, -IOR, -IOW and address signals.
Finally, the DMA will de-assert the HRQ signal. The CPU sees
this, and de-asserts the HOLDA signal. Now the CPU activates its
-MEMR, -MEMW, -IOR, -IOW and address lines, and it resumes executing
instructions and accessing main memory and the peripherals.
For a typical floppy disk sector, the above process is repeated
512 times, once for each byte. Each time a byte is transferred, the
address register in the DMA is incremented and the counter in the DMA
that shows how many bytes are to be transferred is decremented.
When the counter reaches zero, the DMA asserts the EOP signal,
which indicates that the counter has reached zero and no more data
will be transferred until the DMA controller is reprogrammed by the
CPU. This event is also called the Terminal Count (TC). There is only
one EOP signal, and since only DMA channel can be active at any
instant, the DMA channel that is currently active must be the DMA
channel that just completed its task.
If a peripheral wants to generate an interrupt when the transfer
of a buffer is complete, it can test for its -DACKn signal and the EOP
signal both being asserted at the same time. When that happens, it
means the DMA will not transfer any more information for that
peripheral without intervention by the CPU. The peripheral can then
assert one of the interrupt signals to get the processors' attention.
In the PC architecture, the DMA chip itself is not capable of
generating an interrupt. The peripheral and its associated hardware
is responsible for generating any interrupt that occurs.
Subsequently, it is possible to have a peripheral that uses DMA but
does not use interrupts.
It is important to understand that although the CPU always
releases the bus to the DMA when the DMA makes the request, this
action is invisible to both applications and the operating systems,
except for slight changes in the amount of time the processor takes to
execute instructions when the DMA is active. Subsequently, the
processor must poll the peripheral, poll the registers in the DMA
chip, or receive an interrupt from the peripheral to know for certain
when a DMA transfer has completed.
DMA Page Registers and 16Meg address space limitations
You may have noticed earlier that instead of the DMA setting the
address lines to 0x00123456 as we said earlier, the DMA only set
0x3456. The reason for this takes a bit of explaining.
When the original IBM PC was designed, IBM elected to use both DMA
and interrupt controller chips that were designed for use with the
8085, an 8-bit processor with an address space of 16 bits (64K).
Since the IBM PC supported more than 64K of memory, something had to
be done to allow the DMA to read or write memory locations above the
64K mark. What IBM did to solve this problem was to add an external
data latch for each DMA channel that holds the upper bits of the
address to be read to or written from. Whenever a DMA channel is
active, the contents of that latch are written to the address bus and
kept there until the DMA operation for the channel ends. IBM called
these latches “Page Registers”.
So for our example above, the DMA would put the 0x3456 part of the
address on the bus, and the Page Register for DMA channel 2 would put
0x0012xxxx on the bus. Together, these two values form the complete
address in memory that is to be accessed.
Because the Page Register latch is independent of the DMA chip,
the area of memory to be read or written must not span a 64K physical
boundary. For example, if the DMA accesses memory location 0xffff,
after that transfer the DMA will then increment the address register
and the DMA will access the next byte at location 0x0000, not 0x10000.
The results of letting this happen are probably not intended.
“Physical” 64K boundaries should not be confused
with 8086-mode 64K “Segments”, which are created by
mathematically adding a segment register with an offset register.
Page Registers have no address overlap and are mathematically OR-ed
together.
To further complicate matters, the external DMA address latches on
the PC/AT hold only eight bits, so that gives us 8+16=24 bits, which
means that the DMA can only point at memory locations between 0 and
16Meg. For newer computers that allow more than 16Meg of memory, the
standard PC-compatible DMA cannot access memory locations above
16Meg.
To get around this restriction, operating systems will reserve a
RAM buffer in an area below 16Meg that also does not span a physical
64K boundary. Then the DMA will be programmed to transfer data from
the peripheral and into that buffer. Once the DMA has moved the data
into this buffer, the operating system will then copy the data from
the buffer to the address where the data is really supposed to be
stored.
When writing data from an address above 16Meg to a DMA-based
peripheral, the data must be first copied from where it resides into a
buffer located below 16Meg, and then the DMA can copy the data from
the buffer to the hardware. In FreeBSD, these reserved buffers are
called “Bounce Buffers”. In the MS-DOS world, they are
sometimes called “Smart Buffers”.
A new implementation of the 8237, called the 82374, allows 16
bits of page register to be specified, allows access to the entire
32 bit address space, without the use of bounce buffers.
DMA Operational Modes and Settings
The 8237 DMA can be operated in several modes. The main ones
are:
Single
A single byte (or word) is transferred. The DMA must
release and re-acquire the bus for each additional byte. This is
commonly-used by devices that cannot transfer the entire block
of data immediately. The peripheral will request the DMA each
time it is ready for another transfer.
The standard PC-compatible floppy disk controller (NEC 765)
only has a one-byte buffer, so it uses this mode.
Block/Demand
Once the DMA acquires the system bus, an entire block of
data is transferred, up to a maximum of 64K. If the peripheral
needs additional time, it can assert the READY signal to suspend
the transfer briefly. READY should not be used excessively, and
for slow peripheral transfers, the Single Transfer Mode should
be used instead.
The difference between Block and Demand is that once a Block
transfer is started, it runs until the transfer count reaches
zero. DRQ only needs to be asserted until -DACK is asserted.
Demand Mode will transfer one more bytes until DRQ is
de-asserted, at which point the DMA suspends the transfer and
releases the bus back to the CPU. When DRQ is asserted later,
the transfer resumes where it was suspended.
Older hard disk controllers used Demand Mode until CPU
speeds increased to the point that it was more efficient to
transfer the data using the CPU, particularly if the memory
locations used in the transfer were above the 16Meg mark.
Cascade
This mechanism allows a DMA channel to request the bus, but
then the attached peripheral device is responsible for placing
the addressing information on the bus instead of the DMA. This
is also used to implement a technique known as “Bus
Mastering”.
When a DMA channel in Cascade Mode receives control of the
bus, the DMA does not place addresses and I/O control signals on
the bus like the DMA normally does when it is active. Instead,
the DMA only asserts the -DACK signal for the active DMA
channel.
At this point it is up to the peripheral connected to that
DMA channel to provide address and bus control signals. The
peripheral has complete control over the system bus, and can do
reads and/or writes to any address below 16Meg. When the
peripheral is finished with the bus, it de-asserts the DRQ line,
and the DMA controller can then return control to the CPU or to
some other DMA channel.
Cascade Mode can be used to chain multiple DMA controllers
together, and this is exactly what DMA Channel 4 is used for in
the PC architecture. When a peripheral requests the bus on DMA
channels 0, 1, 2 or 3, the slave DMA controller asserts HLDREQ,
but this wire is actually connected to DRQ4 on the primary DMA
controller instead of to the CPU. The primary DMA controller,
thinking it has work to do on Channel 4, requests the bus from
the CPU using HLDREQ signal. Once the CPU grants the bus to the
primary DMA controller, -DACK4 is asserted, and that wire is
actually connected to the HLDA signal on the slave DMA
controller. The slave DMA controller then transfers data for
the DMA channel that requested it (0, 1, 2 or 3), or the slave
DMA may grant the bus to a peripheral that wants to perform its
own bus-mastering, such as a SCSI controller.
Because of this wiring arrangement, only DMA channels 0, 1,
2, 3, 5, 6 and 7 are usable with peripherals on PC/AT
systems.
DMA channel 0 was reserved for refresh operations in early
IBM PC computers, but is generally available for use by
peripherals in modern systems.
When a peripheral is performing Bus Mastering, it is
important that the peripheral transmit data to or from memory
constantly while it holds the system bus. If the peripheral
cannot do this, it must release the bus frequently so that the
system can perform refresh operations on main memory.
The Dynamic RAM used in all PCs for main memory must be
accessed frequently to keep the bits stored in the components
“charged”. Dynamic RAM essentially consists of
millions of capacitors with each one holding one bit of data.
These capacitors are charged with power to represent a
1 or drained to represent a
0. Because all capacitors leak, power must
be added at regular intervals to keep the 1
values intact. The RAM chips actually handle the task of
pumping power back into all of the appropriate locations in RAM,
but they must be told when to do it by the rest of the computer
so that the refresh activity won't interfere with the computer
wanting to access RAM normally. If the computer is unable to
refresh memory, the contents of memory will become corrupted in
just a few milliseconds.
Since memory read and write cycles “count” as
refresh cycles (a dynamic RAM refresh cycle is actually an
incomplete memory read cycle), as long as the peripheral
controller continues reading or writing data to sequential
memory locations, that action will refresh all of memory.
Bus-mastering is found in some SCSI host interfaces and
other high-performance peripheral controllers.
Autoinitialize
This mode causes the DMA to perform Byte, Block or Demand
transfers, but when the DMA transfer counter reaches zero, the
counter and address are set back to where they were when the DMA
channel was originally programmed. This means that as long as
the peripheral requests transfers, they will be granted. It is
up to the CPU to move new data into the fixed buffer ahead of
where the DMA is about to transfer it when doing output
operations, and read new data out of the buffer behind where the
DMA is writing when doing input operations.
This technique is frequently used on audio devices that have
small or no hardware “sample” buffers. There is
additional CPU overhead to manage this “circular”
buffer, but in some cases this may be the only way to eliminate
the latency that occurs when the DMA counter reaches zero and
the DMA stops transfers until it is reprogrammed.
Programming the DMA
The DMA channel that is to be programmed should always be
“masked” before loading any settings. This is because the
hardware might unexpectedly assert the DRQ for that channel, and the
DMA might respond, even though not all of the parameters have been
loaded or updated.
Once masked, the host must specify the direction of the transfer
(memory-to-I/O or I/O-to-memory), what mode of DMA operation is to be
used for the transfer (Single, Block, Demand, Cascade, etc), and
finally the address and length of the transfer are loaded. The length
that is loaded is one less than the amount you expect the DMA to
transfer. The LSB and MSB of the address and length are written to
the same 8-bit I/O port, so another port must be written to first to
guarantee that the DMA accepts the first byte as the LSB and the
second byte as the MSB of the length and address.
Then, be sure to update the Page Register, which is external to
the DMA and is accessed through a different set of I/O ports.
Once all the settings are ready, the DMA channel can be un-masked.
That DMA channel is now considered to be “armed”, and will
respond when the DRQ line for that channel is asserted.
Refer to a hardware data book for precise programming details for
the 8237. You will also need to refer to the I/O port map for the PC
system, which describes where the DMA and Page Register ports are
located. A complete port map table is located below.
DMA Port Map
All systems based on the IBM-PC and PC/AT have the DMA hardware
located at the same I/O ports. The complete list is provided below.
Ports assigned to DMA Controller #2 are undefined on non-AT
designs.
0x00–0x1f DMA Controller #1 (Channels 0, 1, 2 and
3)
DMA Address and Count Registers
0x00
write
Channel 0 starting address
0x00
read
Channel 0 current address
0x01
write
Channel 0 starting word count
0x01
read
Channel 0 remaining word count
0x02
write
Channel 1 starting address
0x02
read
Channel 1 current address
0x03
write
Channel 1 starting word count
0x03
read
Channel 1 remaining word count
0x04
write
Channel 2 starting address
0x04
read
Channel 2 current address
0x05
write
Channel 2 starting word count
0x05
read
Channel 2 remaining word count
0x06
write
Channel 3 starting address
0x06
read
Channel 3 current address
0x07
write
Channel 3 starting word count
0x07
read
Channel 3 remaining word count
DMA Command Registers
0x08
write
Command Register
0x08
read
Status Register
0x09
write
Request Register
0x09
read
-
0x0a
write
Single Mask Register Bit
0x0a
read
-
0x0b
write
Mode Register
0x0b
read
-
0x0c
write
Clear LSB/MSB Flip-Flop
0x0c
read
-
0x0d
write
Master Clear/Reset
0x0d
read
Termporary Register (not available on newer
versions)
0x0e
write
Clear Mask Register
0x0e
read
-
0x0f
write
Write All Mask Register Bits
0x0f
read
Read All Mask Register Bits (only in Intel
82374)
0xc0–0xdf DMA Controller #2 (Channels 4, 5, 6 and
7)
DMA Address and Count Registers
0xc0
write
Channel 4 starting address
0xc0
read
Channel 4 current address
0xc2
write
Channel 4 starting word count
0xc2
read
Channel 4 remaining word count
0xc4
write
Channel 5 starting address
0xc4
read
Channel 5 current address
0xc6
write
Channel 5 starting word count
0xc6
read
Channel 5 remaining word count
0xc8
write
Channel 6 starting address
0xc8
read
Channel 6 current address
0xca
write
Channel 6 starting word count
0xca
read
Channel 6 remaining word count
0xcc
write
Channel 7 starting address
0xcc
read
Channel 7 current address
0xce
write
Channel 7 starting word count
0xce
read
Channel 7 remaining word count
DMA Command Registers
0xd0
write
Command Register
0xd0
read
Status Register
0xd2
write
Request Register
0xd2
read
-
0xd4
write
Single Mask Register Bit
0xd4
read
-
0xd6
write
Mode Register
0xd6
read
-
0xd8
write
Clear LSB/MSB Flip-Flop
0xd8
read
-
0xda
write
Master Clear/Reset
0xda
read
Termporary Register (not present in Intel
82374)
0xdc
write
Clear Mask Register
0xdc
read
-
0xde
write
Write All Mask Register Bits
0xdf
read
Read All Mask Register Bits (only in Intel
82374)
0x80–0x9f DMA Page Registers
0x87
r/w
Channel 0 Low byte (23-16) page Register
0x83
r/w
Channel 1 Low byte (23-16) page Register
0x81
r/w
Channel 2 Low byte (23-16) page Register
0x82
r/w
Channel 3 Low byte (23-16) page Register
0x8b
r/w
Channel 5 Low byte (23-16) page Register
0x89
r/w
Channel 6 Low byte (23-16) page Register
0x8a
r/w
Channel 7 Low byte (23-16) page Register
0x8f
r/w
Low byte page Refresh
0x400–0x4ff 82374 Enhanced DMA Registers
The Intel 82374 EISA System Component (ESC) was introduced in
early 1996 and includes a DMA controller that provides a superset of
8237 functionality as well as other PC-compatible core peripheral
components in a single package. This chip is targeted at both EISA
and PCI platforms, and provides modern DMA features like
scatter-gather, ring buffers as well as direct access by the system
DMA to all 32 bits of address space.
If these features are used, code should also be included to
provide similar functionality in the previous 16 years worth of
PC-compatible computers. For compatibility reasons, some of the
82374 registers must be programmed after
programming the traditional 8237 registers for each transfer.
Writing to a traditional 8237 register forces the contents of some
of the 82374 enhanced registers to zero to provide backward software
compatibility.
0x401
r/w
Channel 0 High byte (bits 23-16) word count
0x403
r/w
Channel 1 High byte (bits 23-16) word count
0x405
r/w
Channel 2 High byte (bits 23-16) word count
0x407
r/w
Channel 3 High byte (bits 23-16) word count
0x4c6
r/w
Channel 5 High byte (bits 23-16) word count
0x4ca
r/w
Channel 6 High byte (bits 23-16) word count
0x4ce
r/w
Channel 7 High byte (bits 23-16) word count
0x487
r/w
Channel 0 High byte (bits 31-24) page Register
0x483
r/w
Channel 1 High byte (bits 31-24) page Register
0x481
r/w
Channel 2 High byte (bits 31-24) page Register
0x482
r/w
Channel 3 High byte (bits 31-24) page Register
0x48b
r/w
Channel 5 High byte (bits 31-24) page Register
0x489
r/w
Channel 6 High byte (bits 31-24) page Register
0x48a
r/w
Channel 6 High byte (bits 31-24) page Register
0x48f
r/w
High byte page Refresh
0x4e0
r/w
Channel 0 Stop Register (bits 7-2)
0x4e1
r/w
Channel 0 Stop Register (bits 15-8)
0x4e2
r/w
Channel 0 Stop Register (bits 23-16)
0x4e4
r/w
Channel 1 Stop Register (bits 7-2)
0x4e5
r/w
Channel 1 Stop Register (bits 15-8)
0x4e6
r/w
Channel 1 Stop Register (bits 23-16)
0x4e8
r/w
Channel 2 Stop Register (bits 7-2)
0x4e9
r/w
Channel 2 Stop Register (bits 15-8)
0x4ea
r/w
Channel 2 Stop Register (bits 23-16)
0x4ec
r/w
Channel 3 Stop Register (bits 7-2)
0x4ed
r/w
Channel 3 Stop Register (bits 15-8)
0x4ee
r/w
Channel 3 Stop Register (bits 23-16)
0x4f4
r/w
Channel 5 Stop Register (bits 7-2)
0x4f5
r/w
Channel 5 Stop Register (bits 15-8)
0x4f6
r/w
Channel 5 Stop Register (bits 23-16)
0x4f8
r/w
Channel 6 Stop Register (bits 7-2)
0x4f9
r/w
Channel 6 Stop Register (bits 15-8)
0x4fa
r/w
Channel 6 Stop Register (bits 23-16)
0x4fc
r/w
Channel 7 Stop Register (bits 7-2)
0x4fd
r/w
Channel 7 Stop Register (bits 15-8)
0x4fe
r/w
Channel 7 Stop Register (bits 23-16)
0x40a
write
Channels 0-3 Chaining Mode Register
0x40a
read
Channel Interrupt Status Register
0x4d4
write
Channels 4-7 Chaining Mode Register
0x4d4
read
Chaining Mode Status
0x40c
read
Chain Buffer Expiration Control Register
0x410
write
Channel 0 Scatter-Gather Command Register
0x411
write
Channel 1 Scatter-Gather Command Register
0x412
write
Channel 2 Scatter-Gather Command Register
0x413
write
Channel 3 Scatter-Gather Command Register
0x415
write
Channel 5 Scatter-Gather Command Register
0x416
write
Channel 6 Scatter-Gather Command Register
0x417
write
Channel 7 Scatter-Gather Command Register
0x418
read
Channel 0 Scatter-Gather Status Register
0x419
read
Channel 1 Scatter-Gather Status Register
0x41a
read
Channel 2 Scatter-Gather Status Register
0x41b
read
Channel 3 Scatter-Gather Status Register
0x41d
read
Channel 5 Scatter-Gather Status Register
0x41e
read
Channel 5 Scatter-Gather Status Register
0x41f
read
Channel 7 Scatter-Gather Status Register
0x420-0x423
r/w
Channel 0 Scatter-Gather Descriptor Table Pointer
Register
0x424-0x427
r/w
Channel 1 Scatter-Gather Descriptor Table Pointer
Register
0x428-0x42b
r/w
Channel 2 Scatter-Gather Descriptor Table Pointer
Register
0x42c-0x42f
r/w
Channel 3 Scatter-Gather Descriptor Table Pointer
Register
0x434-0x437
r/w
Channel 5 Scatter-Gather Descriptor Table Pointer
Register
0x438-0x43b
r/w
Channel 6 Scatter-Gather Descriptor Table Pointer
Register
0x43c-0x43f
r/w
Channel 7 Scatter-Gather Descriptor Table Pointer
Register
The FreeBSD VM System
Contributed by &a.dillon;. 6 Feb 1999
Management of physical
memory—vm_page_t
Physical memory is managed on a page-by-page basis through the
vm_page_t structure. Pages of physical memory are
categorized through the placement of their respective
vm_page_t structures on one of several paging
queues.
A page can be in a wired, active, inactive, cache, or free state.
Except for the wired state, the page is typically placed in a doubly
link list queue representing the state that it is in. Wired pages
are not placed on any queue.
FreeBSD implements a more involved paging queue for cached and
free pages in order to implement page coloring. Each of these states
involves multiple queues arranged according to the size of the
processor's L1 and L2 caches. When a new page needs to be allocated,
FreeBSD attempts to obtain one that is reasonably well aligned from
the point of view of the L1 and L2 caches relative to the VM object
the page is being allocated for.
Additionally, a page may be held with a reference count or locked
with a busy count. The VM system also implements an “ultimate
locked” state for a page using the PG_BUSY bit in the page's
flags.
In general terms, each of the paging queues operates in a LRU
fashion. A page is typicaly placed in a wired or active state
initially. When wired, the page is usually associated with a page
table somewhere. The VM system ages the page by scanning pages in a
more active paging queue (LRU) in order to move them to a less-active
paging queue. Pages that get moved into the cache are still
associated with a VM object but are candidates for immediate reuse.
Pages in the free queue are truely free. FreeBSD attempts to minimize
the number of pages in the free queue, but a certain minimum number of
truely free pages must be maintained in order to accomodate page
allocation at interrupt time.
If a process attempts to access a page that does not exist in its
page table but does exist in one of the paging queues ( such as the
inactive or cache queues), a relatively inexpensive page reactivation
fault occurs which causes the page to be reactivated. If the page
does not exist in system memory at all, the process must block while
the page is brought in from disk.
FreeBSD dynamically tunes its paging queues and attempts to
maintain reasonable ratios of pages in the various queues as well as
attempts to maintain a reasonable breakdown of clean vs dirty pages.
The amount of rebalancing that occurs depends on the system's memory
load. This rebalancing is implemented by the pageout daemon and
involves laundering dirty pages (syncing them with their backing
store), noticing when pages are activity referenced (resetting their
position in the LRU queues or moving them between queues), migrating
pages between queues when the queues are out of balance, and so forth.
FreeBSD's VM system is willing to take a reasonable number of
reactivation page faults to determine how active or how idle a page
actually is. This leads to better decisions being made as to when to
launder or swap-out a page.
The unified buffer
cache—vm_object_t
FreeBSD implements the idea of a generic “VM object”.
VM objects can be associated with backing store of various
types—unbacked, swap-backed, physical device-backed, or
file-backed storage. Since the filesystem uses the same VM objects to
manage in-core data relating to files, the result is a unified buffer
cache.
VM objects can be shadowed. That is, they
can be stacked on top of each other. For example, you might have a
swap-backed VM object stacked on top of a file-backed VM object in
order to implement a MAP_PRIVATE mmap()ing. This stacking is also
used to implement various sharing properties, including,
copy-on-write, for forked address spaces.
It should be noted that a vm_page_t can only be
associated with one VM object at a time. The VM object shadowing
implements the perceived sharing of the same page across multiple
instances.
Filesystem I/O—struct buf
vnode-backed VM objects, such as file-backed objects, generally
- need to maintain their own clean/dirty info independant from the VM
+ need to maintain their own clean/dirty info independent from the VM
system's idea of clean/dirty. For example, when the VM system decides
to synchronize a physical page to its backing store, the VM system
needs to mark the page clean before the page is actually written to
its backing s tore. Additionally, filesystems need to be able to map
portions of a file or file metadata into KVM in order to operate on
it.
The entities used to manage this are known as filesystem buffers,
struct buf's, and also known as
bp's. When a filesystem needs to operate on a
portion of a VM object, it typically maps part of the object into a
struct buf and the maps the pages in the struct buf into KVM. In the
same manner, disk I/O is typically issued by mapping portions of
objects into buffer structures and then issuing the I/O on the buffer
structures. The underlying vm_page_t's are typically busied for the
duration of the I/O. Filesystem buffers also have their own notion of
being busy, which is useful to filesystem driver code which would
rather operate on filesystem buffers instead of hard VM pages.
FreeBSD reserves a limited amount of KVM to hold mappings from
struct bufs, but it should be made clear that this KVM is used solely
to hold mappings and does not limit the ability to cache data.
Physical data caching is strictly a function of
vm_page_t's, not filesystem buffers. However,
since filesystem buffers are used placehold I/O, they do inherently
limit the amount of concurrent I/O possible. As there are usually a
few thousand filesystem buffers available, this is not usually a
problem.
Mapping Page Tables - vm_map_t, vm_entry_t
FreeBSD separates the physical page table topology from the VM
system. All hard per-process page tables can be reconstructed on the
fly and are usually considered throwaway. Special page tables such as
those managing KVM are typically permanently preallocated. These page
tables are not throwaway.
FreeBSD associates portions of vm_objects with address ranges in
virtual memory through vm_map_t and
vm_entry_t structures. Page tables are directly
synthesized from the
vm_map_t/vm_entry_t/
vm_object_t hierarchy. Remember when I mentioned
that physical pages are only directly associated with a
vm_object. Well, that isn't quite true.
vm_page_t's are also linked into page tables that
they are actively associated with. One vm_page_t
can be linked into several pmaps, as page tables
are called. However, the hierarchical association holds so all
references to the same page in the same object reference the same
vm_page_t and thus give us buffer cache unification
across the board.
KVM Memory Mapping
FreeBSD uses KVM to hold various kernel structures. The single
largest entity held in KVM is the filesystem buffer cache. That is,
mappings relating to struct buf entities.
Unlike Linux, FreeBSD does NOT map all of physical memory into
KVM. This means that FreeBSD can handle memory configurations up to
4G on 32 bit platforms. In fact, if the mmu were capable of it,
FreeBSD could theoretically handle memory configurations up to 8TB on
a 32 bit platform. However, since most 32 bit platforms are only
capable of mapping 4GB of ram, this is a moot point.
KVM is managed through several mechanisms. The main mechanism
used to manage KVM is the zone allocator. The
zone allocator takes a chunk of KVM and splits it up into
constant-sized blocks of memory in order to allocate a specific type
of structure. You can use vmstat -m to get an
overview of current KVM utilization broken down by zone.
Tuning the FreeBSD VM system
A concerted effort has been made to make the FreeBSD kernel
dynamically tune itself. Typically you do not need to mess with
anything beyond the maxusers and
NMBCLUSTERS kernel config options. That is, kernel
compilation options specified in (typically)
/usr/src/sys/i386/conf/CONFIG_FILE.
A description of all available kernel configuration options can be
found in /usr/src/sys/i386/conf/LINT.
In a large system configuration you may wish to increase
maxusers. Values typically range from 10 to 128.
Note that raising maxusers too high can cause the
system to overflow available KVM resulting in unpredictable operation.
It is better to leave maxusers at some reasonable number and add other
options, such as NMBCLUSTERS, to increase specific
resources.
If your system is going to use the network heavily, you may want
to increase NMBCLUSTERS. Typical values range from
1024 to 4096.
The NBUF parameter is also traditionally used
to scale the system. This parameter determines the amount of KVA the
system can use to map filesystem buffers for I/O. Note that this
parameter has nothing whatsoever to do with the unified buffer cache!
This parameter is dynamically tuned in 3.0-CURRENT and later kernels
and should generally not be adjusted manually. We recommend that you
not try to specify an NBUF
parameter. Let the system pick it. Too small a value can result in
extremely inefficient filesystem operation while too large a value can
starve the page queues by causing too many pages to become wired
down.
By default, FreeBSD kernels are not optimized. You can set
debugging and optimization flags with the
makeoptions directive in the kernel configuration.
Note that you should not use unless you can
accomodate the large (typically 7 MB+) kernels that result.
makeoptions DEBUG="-g"
makeoptions COPTFLAGS="-O2 -pipe"
Sysctl provides a way to tune kernel parameters at run-time. You
typically do not need to mess with any of the sysctl variables,
especially the VM related ones.
Run time VM and system tuning is relatively straightforward.
First, use softupdates on your UFS/FFS filesystems whenever possible.
/usr/src/contrib/sys/softupdates/README contains
instructions (and restrictions) on how to configure it up.
Second, configure sufficient swap. You should have a swap
partition configured on each physical disk, up to four, even on your
“work” disks. You should have at least 2x the swap space
as you have main memory, and possibly even more if you do not have a
lot of memory. You should also size your swap partition based on the
maximum memory configuration you ever intend to put on the machine so
you do not have to repartition your disks later on. If you want to be
able to accomodate a crash dump, your first swap partition must be at
least as large as main memory and /var/crash must
have sufficient free space to hold the dump.
NFS-based swap is perfectly acceptable on -4.x or later systems,
but you must be aware that the NFS server will take the brunt of the
paging load.
diff --git a/en_US.ISO_8859-1/books/handbook/internals/chapter.sgml b/en_US.ISO_8859-1/books/handbook/internals/chapter.sgml
index 565af21b0b..df6042c920 100644
--- a/en_US.ISO_8859-1/books/handbook/internals/chapter.sgml
+++ b/en_US.ISO_8859-1/books/handbook/internals/chapter.sgml
@@ -1,1864 +1,1864 @@
FreeBSD Internals
The FreeBSD Booting Process
Contributed by &a.phk;. v1.1, April
26th.
Booting FreeBSD is essentially a three step process: load the
kernel, determine the root filesystem and initialize user-land things.
This leads to some interesting possibilities shown below.
Loading a kernel
We presently have three basic mechanisms for loading the kernel as
described below: they all pass some information to the kernel to help
the kernel decide what to do next.
Biosboot
Biosboot is our “bootblocks”. It consists of
two files which will be installed in the first 8Kbytes of the
floppy or hard-disk slice to be booted from.
Biosboot can load a kernel from a FreeBSD filesystem.
Dosboot
Dosboot was written by DI. Christian Gusenbauer, and is
unfortunately at this time one of the few pieces of code that
will not compile under FreeBSD itself because it is written for
Microsoft compilers.
Dosboot will boot the kernel from a MS-DOS file or from a
FreeBSD filesystem partition on the disk. It attempts to
negotiate with the various and strange kinds of memory manglers
that lurk in high memory on MS/DOS systems and usually wins them
for its case.
Netboot
Netboot will try to find a supported Ethernet card, and use
BOOTP, TFTP and NFS to find a kernel file to boot.
Determine the root filesystem
Once the kernel is loaded and the boot-code jumps to it, the
kernel will initialize itself, trying to determine what hardware is
present and so on; it then needs to find a root filesystem.
Presently we support the following types of root
filesystems:
UFS
This is the most normal type of root filesystem. It can
reside on a floppy or on hard disk.
MSDOS
While this is technically possible, it is not particular
useful because of the FAT filesystem's
inability to deal with links, device nodes and other such
“UNIXisms”.
MFS
This is actually a UFS filesystem which has been compiled
into the kernel. That means that the kernel does not really
need any hard disks, floppies or other hardware to
function.
CD9660
This is for using a CD-ROM as root filesystem.
NFS
This is for using a fileserver as root filesystem, basically
making it a diskless machine.
Initialize user-land things
To get the user-land going, the kernel, when it has finished
initialization, will create a process with pid == 1
and execute a program on the root filesystem; this program is normally
/sbin/init.
You can substitute any program for /sbin/init,
as long as you keep in mind that:
there is no stdin/out/err unless you open it yourself. If you
exit, the machine panics. Signal handling is special for pid
== 1.
An example of this is the /stand/sysinstall
program on the installation floppy.
Interesting combinations
Boot a kernel with a MFS in it with a special
/sbin/init which...
A — Using DOS
mounts your C: as
/C:
Attaches C:/freebsd.fs on
/dev/vn0
mounts /dev/vn0 as
/rootfs
makes symlinks
/rootfs/bin ->
/bin
/rootfs/etc ->
/etc
/rootfs/sbin ->
/sbin (etc...)
Now you are running FreeBSD without repartitioning your hard
disk...
B — Using NFS
NFS mounts your server:~you/FreeBSD as
/nfs, chroots to /nfs
and executes /sbin/init there
Now you are running FreeBSD diskless, even though you do not
control the NFS server...
C — Start an X-server
Now you have an X-terminal, which is better than that dingy
X-under-windows-so-slow-you-can-see-what-it-does thing that your
boss insist is better than forking out money on hardware.
D — Using a tape
Takes a copy of /dev/rwd0 and writes it
to a remote tape station or fileserver.
Now you finally get that backup you should have made a year
ago...
E — Acts as a firewall/web-server/what do I
know...
This is particularly interesting since you can boot from a
write- protected floppy, but still write to your root
filesystem...
PC Memory Utilization
Contributed by &a.joerg;. 16 Apr
1995.
A short description of how FreeBSD uses memory on the i386
platform
The boot sector will be loaded at 0:0x7c00, and
relocates itself immediately to 0x7c0:0. (This is
nothing magic, just an adjustment for the %cs
selector, done by an ljmp.)
It then loads the first 15 sectors at 0x10000
(segment BOOTSEG in the biosboot Makefile), and sets
up the stack to work below 0x1fff0. After this, it
jumps to the entry of boot2 within that code. I.e., it jumps over
itself and the (dummy) partition table, and it is going to adjust the
%cs selector—we are still in 16-bit mode there.
boot2 asks for the boot file, and examines the
a.out header. It masks the file entry point
(usually 0xf0100000) by
0x00ffffff, and loads the file there. Hence the
usual load point is 1 MB (0x00100000). During load,
the boot code toggles back and forth between real and protected mode, to
use the BIOS in real mode.
The boot code itself uses segment selectors 0x18
and 0x20 for %cs and
%ds/%es in protected mode, and
0x28 to jump back into real mode. The kernel is
finally started with %cs 0x08 and
%ds/%es/%ss 0x10, which refer to
dummy descriptors covering the entire address space.
The kernel will be started at its load point. Since it has been
linked for another (high) address, it will have to execute PIC until the
page table and page directory stuff is setup properly, at which point
paging will be enabled and the kernel will finally run at the address
for which it was linked.
Contributed by &a.dg;. 16 Apr
1995.
The physical pages immediately following the kernel BSS contain
proc0's page directory, page tables, and upages. Some time later when
the VM system is initialized, the physical memory between
0x1000-0x9ffff and the physical memory after the
kernel (text+data+bss+proc0 stuff+other misc) is made available in the
form of general VM pages and added to the global free page list.
DMA: What it Is and How it Works
Copyright © 1995,1997 &a.uhclem;, All Rights
Reserved. 10 December 1996. Last Update 8 October
1997.
Direct Memory Access (DMA) is a method of allowing data to be moved
from one location to another in a computer without intervention from the
central processor (CPU).
The way that the DMA function is implemented varies between computer
architectures, so this discussion will limit itself to the
implementation and workings of the DMA subsystem on the IBM Personal
Computer (PC), the IBM PC/AT and all of its successors and
clones.
The PC DMA subsystem is based on the Intel 8237 DMA controller. The
8237 contains four DMA channels that can be programmed independently and
any one of the channels may be active at any moment. These channels are
numbered 0, 1, 2 and 3. Starting with the PC/AT, IBM added a second
8237 chip, and numbered those channels 4, 5, 6 and 7.
The original DMA controller (0, 1, 2 and 3) moves one byte in each
transfer. The second DMA controller (4, 5, 6, and 7) moves 16-bits from
two adjacent memory locations in each transfer, with the first byte
always coming from an even-numbered address. The two controllers are
identical components and the difference in transfer size is caused by
the way the second controller is wired into the system.
The 8237 has two electrical signals for each channel, named DRQ and
-DACK. There are additional signals with the names HRQ (Hold Request),
HLDA (Hold Acknowledge), -EOP (End of Process), and the bus control
signals -MEMR (Memory Read), -MEMW (Memory Write), -IOR (I/O Read), and
-IOW (I/O Write).
The 8237 DMA is known as a “fly-by” DMA controller.
This means that the data being moved from one location to another does
not pass through the DMA chip and is not stored in the DMA chip.
Subsequently, the DMA can only transfer data between an I/O port and a
memory address, but not between two I/O ports or two memory
locations.
The 8237 does allow two channels to be connected together to allow
memory-to-memory DMA operations in a non-“fly-by” mode,
but nobody in the PC industry uses this scarce resource this way since
it is faster to move data between memory locations using the
CPU.
In the PC architecture, each DMA channel is normally activated only
when the hardware that uses a given DMA channel requests a transfer by
asserting the DRQ line for that channel.
A Sample DMA transfer
Here is an example of the steps that occur to cause and perform a
DMA transfer. In this example, the floppy disk controller (FDC) has
just read a byte from a diskette and wants the DMA to place it in
memory at location 0x00123456. The process begins by the FDC
asserting the DRQ2 signal (the DRQ line for DMA channel 2) to alert
the DMA controller.
The DMA controller will note that the DRQ2 signal is asserted. The
DMA controller will then make sure that DMA channel 2 has been
programmed and is unmasked (enabled). The DMA controller also makes
sure that none of the other DMA channels are active or want to be
active and have a higher priority. Once these checks are complete,
the DMA asks the CPU to release the bus so that the DMA may use the
bus. The DMA requests the bus by asserting the HRQ signal which goes
to the CPU.
The CPU detects the HRQ signal, and will complete executing the
current instruction. Once the processor has reached a state where it
can release the bus, it will. Now all of the signals normally
generated by the CPU (-MEMR, -MEMW, -IOR, -IOW and a few others) are
placed in a tri-stated condition (neither high or low) and then the
CPU asserts the HLDA signal which tells the DMA controller that it is
now in charge of the bus.
Depending on the processor, the CPU may be able to execute a few
additional instructions now that it no longer has the bus, but the CPU
will eventually have to wait when it reaches an instruction that must
read something from memory that is not in the internal processor cache
or pipeline.
Now that the DMA “is in charge”, the DMA activates its
-MEMR, -MEMW, -IOR, -IOW output signals, and the address outputs from
the DMA are set to 0x3456, which will be used to direct the byte that
is about to transferred to a specific memory location.
The DMA will then let the device that requested the DMA transfer
know that the transfer is commencing. This is done by asserting the
-DACK signal, or in the case of the floppy disk controller, -DACK2 is
asserted.
The floppy disk controller is now responsible for placing the byte
to be transferred on the bus Data lines. Unless the floppy controller
needs more time to get the data byte on the bus (and if the peripheral
does need more time it alerts the DMA via the READY signal), the DMA
will wait one DMA clock, and then de-assert the -MEMW and -IOR signals
so that the memory will latch and store the byte that was on the bus,
and the FDC will know that the byte has been transferred.
Since the DMA cycle only transfers a single byte at a time, the
FDC now drops the DRQ2 signal, so the DMA knows that it is no longer
needed. The DMA will de-assert the -DACK2 signal, so that the FDC
knows it must stop placing data on the bus.
The DMA will now check to see if any of the other DMA channels
have any work to do. If none of the channels have their DRQ lines
asserted, the DMA controller has completed its work and will now
tri-state the -MEMR, -MEMW, -IOR, -IOW and address signals.
Finally, the DMA will de-assert the HRQ signal. The CPU sees
this, and de-asserts the HOLDA signal. Now the CPU activates its
-MEMR, -MEMW, -IOR, -IOW and address lines, and it resumes executing
instructions and accessing main memory and the peripherals.
For a typical floppy disk sector, the above process is repeated
512 times, once for each byte. Each time a byte is transferred, the
address register in the DMA is incremented and the counter in the DMA
that shows how many bytes are to be transferred is decremented.
When the counter reaches zero, the DMA asserts the EOP signal,
which indicates that the counter has reached zero and no more data
will be transferred until the DMA controller is reprogrammed by the
CPU. This event is also called the Terminal Count (TC). There is only
one EOP signal, and since only DMA channel can be active at any
instant, the DMA channel that is currently active must be the DMA
channel that just completed its task.
If a peripheral wants to generate an interrupt when the transfer
of a buffer is complete, it can test for its -DACKn signal and the EOP
signal both being asserted at the same time. When that happens, it
means the DMA will not transfer any more information for that
peripheral without intervention by the CPU. The peripheral can then
assert one of the interrupt signals to get the processors' attention.
In the PC architecture, the DMA chip itself is not capable of
generating an interrupt. The peripheral and its associated hardware
is responsible for generating any interrupt that occurs.
Subsequently, it is possible to have a peripheral that uses DMA but
does not use interrupts.
It is important to understand that although the CPU always
releases the bus to the DMA when the DMA makes the request, this
action is invisible to both applications and the operating systems,
except for slight changes in the amount of time the processor takes to
execute instructions when the DMA is active. Subsequently, the
processor must poll the peripheral, poll the registers in the DMA
chip, or receive an interrupt from the peripheral to know for certain
when a DMA transfer has completed.
DMA Page Registers and 16Meg address space limitations
You may have noticed earlier that instead of the DMA setting the
address lines to 0x00123456 as we said earlier, the DMA only set
0x3456. The reason for this takes a bit of explaining.
When the original IBM PC was designed, IBM elected to use both DMA
and interrupt controller chips that were designed for use with the
8085, an 8-bit processor with an address space of 16 bits (64K).
Since the IBM PC supported more than 64K of memory, something had to
be done to allow the DMA to read or write memory locations above the
64K mark. What IBM did to solve this problem was to add an external
data latch for each DMA channel that holds the upper bits of the
address to be read to or written from. Whenever a DMA channel is
active, the contents of that latch are written to the address bus and
kept there until the DMA operation for the channel ends. IBM called
these latches “Page Registers”.
So for our example above, the DMA would put the 0x3456 part of the
address on the bus, and the Page Register for DMA channel 2 would put
0x0012xxxx on the bus. Together, these two values form the complete
address in memory that is to be accessed.
Because the Page Register latch is independent of the DMA chip,
the area of memory to be read or written must not span a 64K physical
boundary. For example, if the DMA accesses memory location 0xffff,
after that transfer the DMA will then increment the address register
and the DMA will access the next byte at location 0x0000, not 0x10000.
The results of letting this happen are probably not intended.
“Physical” 64K boundaries should not be confused
with 8086-mode 64K “Segments”, which are created by
mathematically adding a segment register with an offset register.
Page Registers have no address overlap and are mathematically OR-ed
together.
To further complicate matters, the external DMA address latches on
the PC/AT hold only eight bits, so that gives us 8+16=24 bits, which
means that the DMA can only point at memory locations between 0 and
16Meg. For newer computers that allow more than 16Meg of memory, the
standard PC-compatible DMA cannot access memory locations above
16Meg.
To get around this restriction, operating systems will reserve a
RAM buffer in an area below 16Meg that also does not span a physical
64K boundary. Then the DMA will be programmed to transfer data from
the peripheral and into that buffer. Once the DMA has moved the data
into this buffer, the operating system will then copy the data from
the buffer to the address where the data is really supposed to be
stored.
When writing data from an address above 16Meg to a DMA-based
peripheral, the data must be first copied from where it resides into a
buffer located below 16Meg, and then the DMA can copy the data from
the buffer to the hardware. In FreeBSD, these reserved buffers are
called “Bounce Buffers”. In the MS-DOS world, they are
sometimes called “Smart Buffers”.
A new implementation of the 8237, called the 82374, allows 16
bits of page register to be specified, allows access to the entire
32 bit address space, without the use of bounce buffers.
DMA Operational Modes and Settings
The 8237 DMA can be operated in several modes. The main ones
are:
Single
A single byte (or word) is transferred. The DMA must
release and re-acquire the bus for each additional byte. This is
commonly-used by devices that cannot transfer the entire block
of data immediately. The peripheral will request the DMA each
time it is ready for another transfer.
The standard PC-compatible floppy disk controller (NEC 765)
only has a one-byte buffer, so it uses this mode.
Block/Demand
Once the DMA acquires the system bus, an entire block of
data is transferred, up to a maximum of 64K. If the peripheral
needs additional time, it can assert the READY signal to suspend
the transfer briefly. READY should not be used excessively, and
for slow peripheral transfers, the Single Transfer Mode should
be used instead.
The difference between Block and Demand is that once a Block
transfer is started, it runs until the transfer count reaches
zero. DRQ only needs to be asserted until -DACK is asserted.
Demand Mode will transfer one more bytes until DRQ is
de-asserted, at which point the DMA suspends the transfer and
releases the bus back to the CPU. When DRQ is asserted later,
the transfer resumes where it was suspended.
Older hard disk controllers used Demand Mode until CPU
speeds increased to the point that it was more efficient to
transfer the data using the CPU, particularly if the memory
locations used in the transfer were above the 16Meg mark.
Cascade
This mechanism allows a DMA channel to request the bus, but
then the attached peripheral device is responsible for placing
the addressing information on the bus instead of the DMA. This
is also used to implement a technique known as “Bus
Mastering”.
When a DMA channel in Cascade Mode receives control of the
bus, the DMA does not place addresses and I/O control signals on
the bus like the DMA normally does when it is active. Instead,
the DMA only asserts the -DACK signal for the active DMA
channel.
At this point it is up to the peripheral connected to that
DMA channel to provide address and bus control signals. The
peripheral has complete control over the system bus, and can do
reads and/or writes to any address below 16Meg. When the
peripheral is finished with the bus, it de-asserts the DRQ line,
and the DMA controller can then return control to the CPU or to
some other DMA channel.
Cascade Mode can be used to chain multiple DMA controllers
together, and this is exactly what DMA Channel 4 is used for in
the PC architecture. When a peripheral requests the bus on DMA
channels 0, 1, 2 or 3, the slave DMA controller asserts HLDREQ,
but this wire is actually connected to DRQ4 on the primary DMA
controller instead of to the CPU. The primary DMA controller,
thinking it has work to do on Channel 4, requests the bus from
the CPU using HLDREQ signal. Once the CPU grants the bus to the
primary DMA controller, -DACK4 is asserted, and that wire is
actually connected to the HLDA signal on the slave DMA
controller. The slave DMA controller then transfers data for
the DMA channel that requested it (0, 1, 2 or 3), or the slave
DMA may grant the bus to a peripheral that wants to perform its
own bus-mastering, such as a SCSI controller.
Because of this wiring arrangement, only DMA channels 0, 1,
2, 3, 5, 6 and 7 are usable with peripherals on PC/AT
systems.
DMA channel 0 was reserved for refresh operations in early
IBM PC computers, but is generally available for use by
peripherals in modern systems.
When a peripheral is performing Bus Mastering, it is
important that the peripheral transmit data to or from memory
constantly while it holds the system bus. If the peripheral
cannot do this, it must release the bus frequently so that the
system can perform refresh operations on main memory.
The Dynamic RAM used in all PCs for main memory must be
accessed frequently to keep the bits stored in the components
“charged”. Dynamic RAM essentially consists of
millions of capacitors with each one holding one bit of data.
These capacitors are charged with power to represent a
1 or drained to represent a
0. Because all capacitors leak, power must
be added at regular intervals to keep the 1
values intact. The RAM chips actually handle the task of
pumping power back into all of the appropriate locations in RAM,
but they must be told when to do it by the rest of the computer
so that the refresh activity won't interfere with the computer
wanting to access RAM normally. If the computer is unable to
refresh memory, the contents of memory will become corrupted in
just a few milliseconds.
Since memory read and write cycles “count” as
refresh cycles (a dynamic RAM refresh cycle is actually an
incomplete memory read cycle), as long as the peripheral
controller continues reading or writing data to sequential
memory locations, that action will refresh all of memory.
Bus-mastering is found in some SCSI host interfaces and
other high-performance peripheral controllers.
Autoinitialize
This mode causes the DMA to perform Byte, Block or Demand
transfers, but when the DMA transfer counter reaches zero, the
counter and address are set back to where they were when the DMA
channel was originally programmed. This means that as long as
the peripheral requests transfers, they will be granted. It is
up to the CPU to move new data into the fixed buffer ahead of
where the DMA is about to transfer it when doing output
operations, and read new data out of the buffer behind where the
DMA is writing when doing input operations.
This technique is frequently used on audio devices that have
small or no hardware “sample” buffers. There is
additional CPU overhead to manage this “circular”
buffer, but in some cases this may be the only way to eliminate
the latency that occurs when the DMA counter reaches zero and
the DMA stops transfers until it is reprogrammed.
Programming the DMA
The DMA channel that is to be programmed should always be
“masked” before loading any settings. This is because the
hardware might unexpectedly assert the DRQ for that channel, and the
DMA might respond, even though not all of the parameters have been
loaded or updated.
Once masked, the host must specify the direction of the transfer
(memory-to-I/O or I/O-to-memory), what mode of DMA operation is to be
used for the transfer (Single, Block, Demand, Cascade, etc), and
finally the address and length of the transfer are loaded. The length
that is loaded is one less than the amount you expect the DMA to
transfer. The LSB and MSB of the address and length are written to
the same 8-bit I/O port, so another port must be written to first to
guarantee that the DMA accepts the first byte as the LSB and the
second byte as the MSB of the length and address.
Then, be sure to update the Page Register, which is external to
the DMA and is accessed through a different set of I/O ports.
Once all the settings are ready, the DMA channel can be un-masked.
That DMA channel is now considered to be “armed”, and will
respond when the DRQ line for that channel is asserted.
Refer to a hardware data book for precise programming details for
the 8237. You will also need to refer to the I/O port map for the PC
system, which describes where the DMA and Page Register ports are
located. A complete port map table is located below.
DMA Port Map
All systems based on the IBM-PC and PC/AT have the DMA hardware
located at the same I/O ports. The complete list is provided below.
Ports assigned to DMA Controller #2 are undefined on non-AT
designs.
0x00–0x1f DMA Controller #1 (Channels 0, 1, 2 and
3)
DMA Address and Count Registers
0x00
write
Channel 0 starting address
0x00
read
Channel 0 current address
0x01
write
Channel 0 starting word count
0x01
read
Channel 0 remaining word count
0x02
write
Channel 1 starting address
0x02
read
Channel 1 current address
0x03
write
Channel 1 starting word count
0x03
read
Channel 1 remaining word count
0x04
write
Channel 2 starting address
0x04
read
Channel 2 current address
0x05
write
Channel 2 starting word count
0x05
read
Channel 2 remaining word count
0x06
write
Channel 3 starting address
0x06
read
Channel 3 current address
0x07
write
Channel 3 starting word count
0x07
read
Channel 3 remaining word count
DMA Command Registers
0x08
write
Command Register
0x08
read
Status Register
0x09
write
Request Register
0x09
read
-
0x0a
write
Single Mask Register Bit
0x0a
read
-
0x0b
write
Mode Register
0x0b
read
-
0x0c
write
Clear LSB/MSB Flip-Flop
0x0c
read
-
0x0d
write
Master Clear/Reset
0x0d
read
Termporary Register (not available on newer
versions)
0x0e
write
Clear Mask Register
0x0e
read
-
0x0f
write
Write All Mask Register Bits
0x0f
read
Read All Mask Register Bits (only in Intel
82374)
0xc0–0xdf DMA Controller #2 (Channels 4, 5, 6 and
7)
DMA Address and Count Registers
0xc0
write
Channel 4 starting address
0xc0
read
Channel 4 current address
0xc2
write
Channel 4 starting word count
0xc2
read
Channel 4 remaining word count
0xc4
write
Channel 5 starting address
0xc4
read
Channel 5 current address
0xc6
write
Channel 5 starting word count
0xc6
read
Channel 5 remaining word count
0xc8
write
Channel 6 starting address
0xc8
read
Channel 6 current address
0xca
write
Channel 6 starting word count
0xca
read
Channel 6 remaining word count
0xcc
write
Channel 7 starting address
0xcc
read
Channel 7 current address
0xce
write
Channel 7 starting word count
0xce
read
Channel 7 remaining word count
DMA Command Registers
0xd0
write
Command Register
0xd0
read
Status Register
0xd2
write
Request Register
0xd2
read
-
0xd4
write
Single Mask Register Bit
0xd4
read
-
0xd6
write
Mode Register
0xd6
read
-
0xd8
write
Clear LSB/MSB Flip-Flop
0xd8
read
-
0xda
write
Master Clear/Reset
0xda
read
Termporary Register (not present in Intel
82374)
0xdc
write
Clear Mask Register
0xdc
read
-
0xde
write
Write All Mask Register Bits
0xdf
read
Read All Mask Register Bits (only in Intel
82374)
0x80–0x9f DMA Page Registers
0x87
r/w
Channel 0 Low byte (23-16) page Register
0x83
r/w
Channel 1 Low byte (23-16) page Register
0x81
r/w
Channel 2 Low byte (23-16) page Register
0x82
r/w
Channel 3 Low byte (23-16) page Register
0x8b
r/w
Channel 5 Low byte (23-16) page Register
0x89
r/w
Channel 6 Low byte (23-16) page Register
0x8a
r/w
Channel 7 Low byte (23-16) page Register
0x8f
r/w
Low byte page Refresh
0x400–0x4ff 82374 Enhanced DMA Registers
The Intel 82374 EISA System Component (ESC) was introduced in
early 1996 and includes a DMA controller that provides a superset of
8237 functionality as well as other PC-compatible core peripheral
components in a single package. This chip is targeted at both EISA
and PCI platforms, and provides modern DMA features like
scatter-gather, ring buffers as well as direct access by the system
DMA to all 32 bits of address space.
If these features are used, code should also be included to
provide similar functionality in the previous 16 years worth of
PC-compatible computers. For compatibility reasons, some of the
82374 registers must be programmed after
programming the traditional 8237 registers for each transfer.
Writing to a traditional 8237 register forces the contents of some
of the 82374 enhanced registers to zero to provide backward software
compatibility.
0x401
r/w
Channel 0 High byte (bits 23-16) word count
0x403
r/w
Channel 1 High byte (bits 23-16) word count
0x405
r/w
Channel 2 High byte (bits 23-16) word count
0x407
r/w
Channel 3 High byte (bits 23-16) word count
0x4c6
r/w
Channel 5 High byte (bits 23-16) word count
0x4ca
r/w
Channel 6 High byte (bits 23-16) word count
0x4ce
r/w
Channel 7 High byte (bits 23-16) word count
0x487
r/w
Channel 0 High byte (bits 31-24) page Register
0x483
r/w
Channel 1 High byte (bits 31-24) page Register
0x481
r/w
Channel 2 High byte (bits 31-24) page Register
0x482
r/w
Channel 3 High byte (bits 31-24) page Register
0x48b
r/w
Channel 5 High byte (bits 31-24) page Register
0x489
r/w
Channel 6 High byte (bits 31-24) page Register
0x48a
r/w
Channel 6 High byte (bits 31-24) page Register
0x48f
r/w
High byte page Refresh
0x4e0
r/w
Channel 0 Stop Register (bits 7-2)
0x4e1
r/w
Channel 0 Stop Register (bits 15-8)
0x4e2
r/w
Channel 0 Stop Register (bits 23-16)
0x4e4
r/w
Channel 1 Stop Register (bits 7-2)
0x4e5
r/w
Channel 1 Stop Register (bits 15-8)
0x4e6
r/w
Channel 1 Stop Register (bits 23-16)
0x4e8
r/w
Channel 2 Stop Register (bits 7-2)
0x4e9
r/w
Channel 2 Stop Register (bits 15-8)
0x4ea
r/w
Channel 2 Stop Register (bits 23-16)
0x4ec
r/w
Channel 3 Stop Register (bits 7-2)
0x4ed
r/w
Channel 3 Stop Register (bits 15-8)
0x4ee
r/w
Channel 3 Stop Register (bits 23-16)
0x4f4
r/w
Channel 5 Stop Register (bits 7-2)
0x4f5
r/w
Channel 5 Stop Register (bits 15-8)
0x4f6
r/w
Channel 5 Stop Register (bits 23-16)
0x4f8
r/w
Channel 6 Stop Register (bits 7-2)
0x4f9
r/w
Channel 6 Stop Register (bits 15-8)
0x4fa
r/w
Channel 6 Stop Register (bits 23-16)
0x4fc
r/w
Channel 7 Stop Register (bits 7-2)
0x4fd
r/w
Channel 7 Stop Register (bits 15-8)
0x4fe
r/w
Channel 7 Stop Register (bits 23-16)
0x40a
write
Channels 0-3 Chaining Mode Register
0x40a
read
Channel Interrupt Status Register
0x4d4
write
Channels 4-7 Chaining Mode Register
0x4d4
read
Chaining Mode Status
0x40c
read
Chain Buffer Expiration Control Register
0x410
write
Channel 0 Scatter-Gather Command Register
0x411
write
Channel 1 Scatter-Gather Command Register
0x412
write
Channel 2 Scatter-Gather Command Register
0x413
write
Channel 3 Scatter-Gather Command Register
0x415
write
Channel 5 Scatter-Gather Command Register
0x416
write
Channel 6 Scatter-Gather Command Register
0x417
write
Channel 7 Scatter-Gather Command Register
0x418
read
Channel 0 Scatter-Gather Status Register
0x419
read
Channel 1 Scatter-Gather Status Register
0x41a
read
Channel 2 Scatter-Gather Status Register
0x41b
read
Channel 3 Scatter-Gather Status Register
0x41d
read
Channel 5 Scatter-Gather Status Register
0x41e
read
Channel 5 Scatter-Gather Status Register
0x41f
read
Channel 7 Scatter-Gather Status Register
0x420-0x423
r/w
Channel 0 Scatter-Gather Descriptor Table Pointer
Register
0x424-0x427
r/w
Channel 1 Scatter-Gather Descriptor Table Pointer
Register
0x428-0x42b
r/w
Channel 2 Scatter-Gather Descriptor Table Pointer
Register
0x42c-0x42f
r/w
Channel 3 Scatter-Gather Descriptor Table Pointer
Register
0x434-0x437
r/w
Channel 5 Scatter-Gather Descriptor Table Pointer
Register
0x438-0x43b
r/w
Channel 6 Scatter-Gather Descriptor Table Pointer
Register
0x43c-0x43f
r/w
Channel 7 Scatter-Gather Descriptor Table Pointer
Register
The FreeBSD VM System
Contributed by &a.dillon;. 6 Feb 1999
Management of physical
memory—vm_page_t
Physical memory is managed on a page-by-page basis through the
vm_page_t structure. Pages of physical memory are
categorized through the placement of their respective
vm_page_t structures on one of several paging
queues.
A page can be in a wired, active, inactive, cache, or free state.
Except for the wired state, the page is typically placed in a doubly
link list queue representing the state that it is in. Wired pages
are not placed on any queue.
FreeBSD implements a more involved paging queue for cached and
free pages in order to implement page coloring. Each of these states
involves multiple queues arranged according to the size of the
processor's L1 and L2 caches. When a new page needs to be allocated,
FreeBSD attempts to obtain one that is reasonably well aligned from
the point of view of the L1 and L2 caches relative to the VM object
the page is being allocated for.
Additionally, a page may be held with a reference count or locked
with a busy count. The VM system also implements an “ultimate
locked” state for a page using the PG_BUSY bit in the page's
flags.
In general terms, each of the paging queues operates in a LRU
fashion. A page is typicaly placed in a wired or active state
initially. When wired, the page is usually associated with a page
table somewhere. The VM system ages the page by scanning pages in a
more active paging queue (LRU) in order to move them to a less-active
paging queue. Pages that get moved into the cache are still
associated with a VM object but are candidates for immediate reuse.
Pages in the free queue are truely free. FreeBSD attempts to minimize
the number of pages in the free queue, but a certain minimum number of
truely free pages must be maintained in order to accomodate page
allocation at interrupt time.
If a process attempts to access a page that does not exist in its
page table but does exist in one of the paging queues ( such as the
inactive or cache queues), a relatively inexpensive page reactivation
fault occurs which causes the page to be reactivated. If the page
does not exist in system memory at all, the process must block while
the page is brought in from disk.
FreeBSD dynamically tunes its paging queues and attempts to
maintain reasonable ratios of pages in the various queues as well as
attempts to maintain a reasonable breakdown of clean vs dirty pages.
The amount of rebalancing that occurs depends on the system's memory
load. This rebalancing is implemented by the pageout daemon and
involves laundering dirty pages (syncing them with their backing
store), noticing when pages are activity referenced (resetting their
position in the LRU queues or moving them between queues), migrating
pages between queues when the queues are out of balance, and so forth.
FreeBSD's VM system is willing to take a reasonable number of
reactivation page faults to determine how active or how idle a page
actually is. This leads to better decisions being made as to when to
launder or swap-out a page.
The unified buffer
cache—vm_object_t
FreeBSD implements the idea of a generic “VM object”.
VM objects can be associated with backing store of various
types—unbacked, swap-backed, physical device-backed, or
file-backed storage. Since the filesystem uses the same VM objects to
manage in-core data relating to files, the result is a unified buffer
cache.
VM objects can be shadowed. That is, they
can be stacked on top of each other. For example, you might have a
swap-backed VM object stacked on top of a file-backed VM object in
order to implement a MAP_PRIVATE mmap()ing. This stacking is also
used to implement various sharing properties, including,
copy-on-write, for forked address spaces.
It should be noted that a vm_page_t can only be
associated with one VM object at a time. The VM object shadowing
implements the perceived sharing of the same page across multiple
instances.
Filesystem I/O—struct buf
vnode-backed VM objects, such as file-backed objects, generally
- need to maintain their own clean/dirty info independant from the VM
+ need to maintain their own clean/dirty info independent from the VM
system's idea of clean/dirty. For example, when the VM system decides
to synchronize a physical page to its backing store, the VM system
needs to mark the page clean before the page is actually written to
its backing s tore. Additionally, filesystems need to be able to map
portions of a file or file metadata into KVM in order to operate on
it.
The entities used to manage this are known as filesystem buffers,
struct buf's, and also known as
bp's. When a filesystem needs to operate on a
portion of a VM object, it typically maps part of the object into a
struct buf and the maps the pages in the struct buf into KVM. In the
same manner, disk I/O is typically issued by mapping portions of
objects into buffer structures and then issuing the I/O on the buffer
structures. The underlying vm_page_t's are typically busied for the
duration of the I/O. Filesystem buffers also have their own notion of
being busy, which is useful to filesystem driver code which would
rather operate on filesystem buffers instead of hard VM pages.
FreeBSD reserves a limited amount of KVM to hold mappings from
struct bufs, but it should be made clear that this KVM is used solely
to hold mappings and does not limit the ability to cache data.
Physical data caching is strictly a function of
vm_page_t's, not filesystem buffers. However,
since filesystem buffers are used placehold I/O, they do inherently
limit the amount of concurrent I/O possible. As there are usually a
few thousand filesystem buffers available, this is not usually a
problem.
Mapping Page Tables - vm_map_t, vm_entry_t
FreeBSD separates the physical page table topology from the VM
system. All hard per-process page tables can be reconstructed on the
fly and are usually considered throwaway. Special page tables such as
those managing KVM are typically permanently preallocated. These page
tables are not throwaway.
FreeBSD associates portions of vm_objects with address ranges in
virtual memory through vm_map_t and
vm_entry_t structures. Page tables are directly
synthesized from the
vm_map_t/vm_entry_t/
vm_object_t hierarchy. Remember when I mentioned
that physical pages are only directly associated with a
vm_object. Well, that isn't quite true.
vm_page_t's are also linked into page tables that
they are actively associated with. One vm_page_t
can be linked into several pmaps, as page tables
are called. However, the hierarchical association holds so all
references to the same page in the same object reference the same
vm_page_t and thus give us buffer cache unification
across the board.
KVM Memory Mapping
FreeBSD uses KVM to hold various kernel structures. The single
largest entity held in KVM is the filesystem buffer cache. That is,
mappings relating to struct buf entities.
Unlike Linux, FreeBSD does NOT map all of physical memory into
KVM. This means that FreeBSD can handle memory configurations up to
4G on 32 bit platforms. In fact, if the mmu were capable of it,
FreeBSD could theoretically handle memory configurations up to 8TB on
a 32 bit platform. However, since most 32 bit platforms are only
capable of mapping 4GB of ram, this is a moot point.
KVM is managed through several mechanisms. The main mechanism
used to manage KVM is the zone allocator. The
zone allocator takes a chunk of KVM and splits it up into
constant-sized blocks of memory in order to allocate a specific type
of structure. You can use vmstat -m to get an
overview of current KVM utilization broken down by zone.
Tuning the FreeBSD VM system
A concerted effort has been made to make the FreeBSD kernel
dynamically tune itself. Typically you do not need to mess with
anything beyond the maxusers and
NMBCLUSTERS kernel config options. That is, kernel
compilation options specified in (typically)
/usr/src/sys/i386/conf/CONFIG_FILE.
A description of all available kernel configuration options can be
found in /usr/src/sys/i386/conf/LINT.
In a large system configuration you may wish to increase
maxusers. Values typically range from 10 to 128.
Note that raising maxusers too high can cause the
system to overflow available KVM resulting in unpredictable operation.
It is better to leave maxusers at some reasonable number and add other
options, such as NMBCLUSTERS, to increase specific
resources.
If your system is going to use the network heavily, you may want
to increase NMBCLUSTERS. Typical values range from
1024 to 4096.
The NBUF parameter is also traditionally used
to scale the system. This parameter determines the amount of KVA the
system can use to map filesystem buffers for I/O. Note that this
parameter has nothing whatsoever to do with the unified buffer cache!
This parameter is dynamically tuned in 3.0-CURRENT and later kernels
and should generally not be adjusted manually. We recommend that you
not try to specify an NBUF
parameter. Let the system pick it. Too small a value can result in
extremely inefficient filesystem operation while too large a value can
starve the page queues by causing too many pages to become wired
down.
By default, FreeBSD kernels are not optimized. You can set
debugging and optimization flags with the
makeoptions directive in the kernel configuration.
Note that you should not use unless you can
accomodate the large (typically 7 MB+) kernels that result.
makeoptions DEBUG="-g"
makeoptions COPTFLAGS="-O2 -pipe"
Sysctl provides a way to tune kernel parameters at run-time. You
typically do not need to mess with any of the sysctl variables,
especially the VM related ones.
Run time VM and system tuning is relatively straightforward.
First, use softupdates on your UFS/FFS filesystems whenever possible.
/usr/src/contrib/sys/softupdates/README contains
instructions (and restrictions) on how to configure it up.
Second, configure sufficient swap. You should have a swap
partition configured on each physical disk, up to four, even on your
“work” disks. You should have at least 2x the swap space
as you have main memory, and possibly even more if you do not have a
lot of memory. You should also size your swap partition based on the
maximum memory configuration you ever intend to put on the machine so
you do not have to repartition your disks later on. If you want to be
able to accomodate a crash dump, your first swap partition must be at
least as large as main memory and /var/crash must
have sufficient free space to hold the dump.
NFS-based swap is perfectly acceptable on -4.x or later systems,
but you must be aware that the NFS server will take the brunt of the
paging load.