Increasing the default boundary size for SDMA.
The default boundary size (4K) causes a lot of DMA interrupts which in turn lower the performance. Increasing it to 512K, I was able to see a significant increase in throughput values.
And, its set to 512k by default in linux as well.