Page MenuHomeFreeBSD

D52770.1778088874.diff
No OneTemporary

Size
6 KB
Referenced Files
None
Subscribers
None

D52770.1778088874.diff

diff --git a/share/man/man9/buf.9 b/share/man/man9/buf.9
--- a/share/man/man9/buf.9
+++ b/share/man/man9/buf.9
@@ -36,44 +36,70 @@
to map potentially disparate vm_page's into contiguous KVM for use by
(mainly file system) devices and device I/O.
This abstraction supports
-block sizes from DEV_BSIZE (usually 512) to upwards of several pages or more.
+block sizes from
+.Dv DEV_BSIZE
+(usually 512) to upwards of several pages or more.
It also supports a relatively primitive byte-granular valid range and dirty
range currently hardcoded for use by NFS.
The code implementing the
VM Buffer abstraction is mostly concentrated in
-.Pa /usr/src/sys/kern/vfs_bio.c .
+.Pa sys/kern/vfs_bio.c
+in the
+.Fx
+source tree.
.Pp
One of the most important things to remember when dealing with buffer pointers
-(struct buf) is that the underlying pages are mapped directly from the buffer
+.Pq Vt struct buf
+is that the underlying pages are mapped directly from the buffer
cache.
No data copying occurs in the scheme proper, though some file systems
such as UFS do have to copy a little when dealing with file fragments.
The second most important thing to remember is that due to the underlying page
-mapping, the b_data base pointer in a buf is always *page* aligned, not
-*block* aligned.
-When you have a VM buffer representing some b_offset and
-b_size, the actual start of the buffer is (b_data + (b_offset & PAGE_MASK))
-and not just b_data.
+mapping, the
+.Va b_data
+base pointer in a buf is always
+.Em page Ns -aligned ,
+not
+.Em block Ns -aligned .
+When you have a VM buffer representing some
+.Va b_offset
+and
+.Va b_size ,
+the actual start of the buffer is
+.Ql b_data + (b_offset & PAGE_MASK)
+and not just
+.Ql b_data .
Finally, the VM system's core buffer cache supports
-valid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks.
+valid and dirty bits
+.Pq Va m->valid , m->dirty
+for pages in
+.Dv DEV_BSIZE
+chunks.
Thus
a platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
bits.
These bits are generally set and cleared in groups based on the device
block size of the device backing the page.
Complete page's worth are often
-referred to using the VM_PAGE_BITS_ALL bitmask (i.e., 0xFF if the hardware page
+referred to using the
+.Dv VM_PAGE_BITS_ALL
+bitmask (i.e., 0xFF if the hardware page
size is 4096).
.Pp
VM buffers also keep track of a byte-granular dirty range and valid range.
This feature is normally only used by the NFS subsystem.
I am not sure why it
-is used at all, actually, since we have DEV_BSIZE valid/dirty granularity
+is used at all, actually, since we have
+.Dv DEV_BSIZE
+valid/dirty granularity
within the VM buffer.
-If a buffer dirty operation creates a 'hole',
+If a buffer dirty operation creates a
+.Dq hole ,
the dirty range will extend to cover the hole.
If a buffer validation
-operation creates a 'hole' the byte-granular valid range is left alone and
+operation creates a
+.Dq hole
+the byte-granular valid range is left alone and
will not take into account the new extension.
Thus the whole byte-granular
abstraction is considered a bad hack and it would be nice if we could get rid
@@ -81,16 +107,24 @@
.Pp
A VM buffer is capable of mapping the underlying VM cache pages into KVM in
order to allow the kernel to directly manipulate the data associated with
-the (vnode,b_offset,b_size).
+the
+.Pq Va vnode , b_offset , b_size .
The kernel typically unmaps VM buffers the moment
-they are no longer needed but often keeps the 'struct buf' structure
-instantiated and even bp->b_pages array instantiated despite having unmapped
+they are no longer needed but often keeps the
+.Vt struct buf
+structure
+instantiated and even
+.Va bp->b_pages
+array instantiated despite having unmapped
them from KVM.
If a page making up a VM buffer is about to undergo I/O, the
-system typically unmaps it from KVM and replaces the page in the b_pages[]
+system typically unmaps it from KVM and replaces the page in the
+.Va b_pages[]
array with a place-marker called bogus_page.
The place-marker forces any kernel
-subsystems referencing the associated struct buf to re-lookup the associated
+subsystems referencing the associated
+.Vt struct buf
+to re-lookup the associated
page.
I believe the place-marker hack is used to allow sophisticated devices
such as file system devices to remap underlying pages in order to deal with,
@@ -107,18 +141,29 @@
If not
treated carefully, these pages could be thrown away!
Indeed, a number of
-serious bugs related to this hack were not fixed until the 2.2.8/3.0 release.
-The kernel uses an instantiated VM buffer (i.e., struct buf) to place-mark pages
+serious bugs related to this hack were not fixed until the
+.Fx 2.2.8 /
+.Fx 3.0
+release.
+The kernel uses an instantiated VM buffer (i.e.,
+.Vt struct buf )
+to place-mark pages
in this special state.
-The buffer is typically flagged B_DELWRI.
+The buffer is typically flagged
+.Dv B_DELWRI .
When a
-device no longer needs a buffer it typically flags it as B_RELBUF.
+device no longer needs a buffer it typically flags it as
+.Dv B_RELBUF .
Due to
-the underlying pages being marked clean, the B_DELWRI|B_RELBUF combination must
+the underlying pages being marked clean, the
+.Ql B_DELWRI|B_RELBUF
+combination must
be interpreted to mean that the buffer is still actually dirty and must be
written to its backing store before it can actually be released.
In the case
-where B_DELWRI is not set, the underlying dirty pages are still properly
+where
+.Dv B_DELWRI
+is not set, the underlying dirty pages are still properly
marked as dirty and the buffer can be completely freed without losing that
clean/dirty state information.
(XXX do we have to check other flags in
@@ -128,7 +173,9 @@
maps.
Even though this is virtual space (since the buffers are mapped
from the buffer cache), we cannot make it arbitrarily large because
-instantiated VM Buffers (struct buf's) prevent their underlying pages in the
+instantiated VM Buffers
+.Pq Vt struct buf Ap s
+prevent their underlying pages in the
buffer cache from being freed.
This can complicate the life of the paging
system.

File Metadata

Mime Type
text/plain
Expires
Wed, May 6, 5:34 PM (7 h, 53 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
28577150
Default Alt Text
D52770.1778088874.diff (6 KB)

Event Timeline