Use unmapped bufs for indirect block buffers in bmap if the platform
has a direct map, and use the direct map to access the buffer data
instead of using the traditional virtually contiguous mapping.
On our 96-core boxes serving 350-ish Gb/s of streaming video traffic, this change gives a reduction in CPU usage of about 8% of CPU cycles used.