# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018-2020 NXP

QDMA
----

Introduction:
The qDMA controller transfers blocks of data between one source and one or
more destinations. The blocks of data transferred can be represented in
memory as contiguous or non-contiguous using scatter/gather table(s). Channel
virtualization is supported through enqueuing of DMA jobs to, or dequeuing
DMA jobs from, different work queues.

The qDMA supports channel virtualization by allowing DMA jobs to be enqueued
into different command queues. Core can initiate a DMA transaction by
preparing a command descriptor (CD) for each DMA job and enqueuing this job
through a command queue.  The qDMA prefetches DMA jobs from command queues.

It then schedules and dispatches to internal DMA hardware engines, which
generate read and write requests. Both qDMA source data and destination data
can be either contiguous or non-contiguous using one or more scatter/gather
tables.

DPDK based QDMA driver
----------------------

The DPAA2 QDMA is an implementation of the rawdev API in DPDK, that provide
means to initiate a DMA transaction from CPU. The initiated DMA is performed
without CPU being involved in the actual DMA transaction. This is achieved via
using the DPDMAI device exposed by MC on DPAA2 devices .

More info is available at:
https://doc.dpdk.org/guides/rawdevs/dpaa2_qdma.html

DPDK based QDMA APIs are available at:
https://doc.dpdk.org/api/rte__pmd__dpaa2__qdma_8h.html
or at: https://source.codeaurora.org/external/qoriq/qoriq-components/dpdk/tree
/drivers/raw/dpaa2_qdma/rte_pmd_dpaa2_qdma.h?h=integration

There is one sample application in NXP SDK to demonstrate the usages of QDMA
APIs.
1. dpdk-qdma-demo - simple app to transfer and test memory copy between mem and pci
in various options using QDMA.

=====================================
Precondition to run any dpdk based APP

=> create the MC objects for network and DMA in MC

# use DPDMAI count as 16 or higher e.g.
export DPDMAI_COUNT=48
source /usr/local/dpdk/dpaa2/dynamic_dpl.sh dpmac.5
or
source /usr/bin/dpdk-example/extras/dpaa2/dynamic_dpl.sh dpmac.5
export DPRC=dprc.2

------------------------------------------------------
qdma_demo - APP to test memory transfer over PCI
-----------------------------------------------------
This application demonstrates usage of QDMA driver API in order to do QDMA
transfer over PCIE. This can be used to test FPGA PCIe memory area read/write
throughput bandwidth, the current DPDK QDMA demo is ready for it.

The setup is as given below:

  PCIe Root (Host) device  <== PCI link ==> PCIe End point device

  qdma_demo runs on PCIE root device and it does data transfer between local
  memory of PCI host to the memory of End point over PCI link.

  The application has been tested uisng following hardware:
  - PCIe Root (host) - LX2 RDB
  - PCIe end point - LX2 RDB (or Any other PCI based EP)

2. Assumptions
  - The end point exposes the targeted memory of test using one of the BARs
  - Host Enumerates the End point correctly and assigns the addresses to the BAR
    which is used as target of memory transfer.
  - Host can read the End point BAR using tool like lspci.

3. Compilation

   Compilation command
   --------------------
   Note: If you are using Scatter Gather support, It is preferable to build DPDK through
	 standard compilation steps after manually setting RTE_LIBRTE_DPAA2_USE_PHYS_IOVA
	 as true in config/arm/meson.build.

   meson arm64-build --cross-file config/arm/arm64_dpaa_linux_gcc -Dexamples=qdma_demo
   ninja -C arm64-build

4. Steps of execution

  End Point steps
  ===============
  a) If LX2-EP PCI card, boot it to u-boot prompt only
  b) For standard PCI NIc card - nothing needs to be done.

  HOST - LX2 Root complex steps
  ========================
  a) Boot LX2 to Linux prompt
  b) run 'lspci -v' to check the address of BAR whose memory is targeted memory for test
    $ lspci -v

        0000:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
        Connection

        Subsystem: Intel Corporation Gigabit CT Desktop Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 106
        Memory at 30460c0000 (32-bit, non-prefetchable) [size=128K]
        Memory at 3046000000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 1000 [disabled] [size=32]
        Memory at 30460e0000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at 3046080000 [disabled] [size=256K]
        Kernel driver in use: e1000e

    Take the address of BAR you want to use in test and set it as value of
    "--pci_addr" command line argument to qdma_demo application

  c) Assign PCI device to userspace
        - load the UIO module, if not loaded already
              #modprobe uio_pci_generic
        - assign the device to userspace
              #/usr/local/share/dpdk/usertools/dpdk-devbind.py
              --bind=uio_pci_generic 0000:01:00.1

  d)Run qdma_demo application

	NOTE: At least 2 cores are required to run the test, one core is used for
        printing results/stats, other cores for running test.

	$export DPDMAI_COUNT=48
	$./dynamic_dpl.sh dpmac.3
	$export DPRC=dprc.2
    Case #1: mem to mem
	$./qdma_demo  -c 0x81 -- --packet_size=512 --test_case=mem_to_mem


    Case #2: mem to pci (using a Gen2-x1 - 1G PCI NIC)

	$./qdma_demo -c 0x81 -- --pci_addr=0x3046000000 --packet_size=512 --test_case=mem_to_pci
                EAL: Detected 8 lcore(s)
                EAL: Detected 1 NUMA nodes
                EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
                fslmc: Skipping invalid device (power)
                EAL: 256 hugepages of size 2097152 reserved, but no mounted hugetlbfs found
                for that size

                EAL: Probing VFIO support...
                EAL: VFIO support initialized
                EAL: PCI device 0000:01:00.0 on NUMA socket -1
                EAL:   Invalid NUMA socket, default to 0
                EAL:   probe driver: 8086:10d3 net_e1000_em
                PMD: dpni.1: netdev created
                PMD: dpni.2: netdev created
                qdma_parse_long_arg: PCI addr 3046000000
                qdma_parse_long_arg: Pkt size 512
                qdma_parse_long_arg:test case mem_to_pci
                qdma_demo_validate_args: Stats core id - 0
                test packet count 16
                Rate:71.99982 cpu freq:1800 MHz
                Spend :2000.000 ms
                Local bufs, g_buf 0x17a80b000, g_buf1 0x17a808000
                [0] job ptr 0x17a807000
                [7] job ptr 0x17a806000
                core id:0 g_vqid[0]:0 g_vqid[16]:1
                core id:7 g_vqid[7]:2 g_vqid[23]:3
                cores:0 packet count:0

                test memory size: 0x2000 packets number:16 packet size: 512
                Local mem phy addr: 0 addr1: 0 g_jobs[0]:0x17a807000


                Using long format to test: packet size 512 Bytes, MEM_TO_PCI
                Master coreid: 0 ready, now!
                Processing coreid: 7 ready, now!

                =>Time Spend :4000.005 ms rcvd cnt:1310720 pkt_cnt:0
                Rate: 1342.176 Mbps OR 327.680 Kpps
                processed on core 7 pkt cnt: 1310720

                =>Time Spend :4000.000 ms rcvd cnt:1376256 pkt_cnt:0
                Rate: 1409.286 Mbps OR 344.064 Kpps
                processed on core 7 pkt cnt: 1376256


 Note:

1. The current QDMA demo code read/write only 4KB area, that yield best
bandwidth number. To test read/write big memory size, you can optionally pass
--pci_size (size in byte)

e.g for 2MB PCI use:  --pci_size=2147483648

2. PCIe address should be:
   - mapped to valid memory area.
   - bar address of device which is NOT used by driver.

LATENCY_TEST
------------
add "--latency_test" for testing latency

MEMCOPY
-------

One can use --memcpy option to use core instead of qdma engine for comparision reasons.

SCATTER GATHER
--------------

One can use --scatter option to use Scatter Gather buffers instead of normal single buffer.
It is perferable to build dpdk with CONFIG_RTE_LIBRTE_DPAA2_USE_PHYS_IOVA=y

DATA VALIDATION
---------------

One can use --validate option to validate data once DMA operation is performed.
