.. _module_dma_axi_write_simple: Module dma_axi_write_simple =========================== This document contains technical documentation for the ``dma_axi_write_simple`` module. This module has a register interface, so make sure to study the :ref:`register interface documentation ` as well as this top-level document. To browse the source code, visit the `repository on GitHub `__. This module contains an open-source Direct Memory Access (DMA) component for streaming data from FPGA to DDR memory over AXI. Sometimes called "AXI DMA S2MM". The implementation is optimized for 1. very low :ref:`resource usage `, and 2. maximum :ref:`AXI/data throughput `. Being simplified, however, it has the following limitations: 1. Can only write to continuous ring buffer space in DDR. No scatter-gather support. 2. Does not support data strobing or narrow AXI transfers. All addresses must be aligned with the AXI data width. 3. Uses a static compile-time packet length, with no support for partial packets. 4. Packet length must be power of two. These limitations and the simplicity of the design are intentional. This is what enables the low resource usage and high throughput. C++ driver ---------- There is a complete C++ driver available in the `cpp sub-folder `__ in the repository. The class provides a convenient API for setting up the module and receiving stream data from the FPGA. It supports an interrupt-based as well as a polling-based workflow. See the header file for documentation. Simulate and build FPGA with register artifacts ----------------------------------------------- This module is controlled over a register bus, with code generated by `hdl-registers `_. See :ref:`dma_axi_write_simple.register_interface` for register documentation. Generated register code artifacts are not checked in to the repository. The recommended way to use hdl-modules is with `tsfpga `__ (see :ref:`getting_started`), in which case register code is always generated and kept up to date automatically. This is by far the most convenient and portable solution. If you dont't want to use tsfpga, you can integrate hdl-registers code generation in your build/simulation flow or use the hard coded artifacts below (not recommended). Hard coded artifacts ____________________ Not recommended, but if you don't want to use tsfpga or hdl-registers, these generated VHDL artifacts can be included in the ``dma_axi_write_simple`` library for simulation and synthesis: 1. :download:`regs_src/dma_axi_write_simple_regs_pkg.vhd ` 2. :download:`regs_src/dma_axi_write_simple_register_record_pkg.vhd ` 3. :download:`regs_src/dma_axi_write_simple_register_file_axi_lite.vhd ` 4. :download:`regs_sim/dma_axi_write_simple_register_read_write_pkg.vhd ` The first few are source files that shall be included in your simulation as well build project. The last one is a simulation file that shall be included only in your simulation project. These generated C++ artifacts can be used to control the module from software: 1. :download:`include/i_dma_axi_write_simple.h ` 2. :download:`include/dma_axi_write_simple.h ` 3. :download:`dma_axi_write_simple.cpp ` .. warning:: When copy-pasting generated artifacts, there is a large risk that things go out of sync when e.g. versions are bumped. An automated solution with :ref:`tsfpga ` is highly recommended. .. _dma_axi_write_simple.register_interface: Register interface ------------------ This module is controlled and monitored over a register bus. Please see :download:`separate HTML page ` for register documentation. Register code is generated using `hdl-registers `_ based on the `regs_dma_axi_write_simple.toml `_ data file. .. _dma_axi_write_simple.dma_axi_write_simple: dma_axi_write_simple.vhd ------------------------ `View source code on GitHub `__. .. symbolator:: component dma_axi_write_simple is generic ( address_width : axi_address_width_t; stream_data_width : axi_data_width_t; axi_data_width : axi_data_width_t; packet_length_beats : positive; enable_axi3 : boolean; write_done_aggregate_count : positive; write_done_aggregate_ticks : positive ); port ( clk : in std_ulogic; --# {{}} stream_ready : out std_ulogic; stream_valid : in std_ulogic; stream_data : in std_ulogic_vector; --# {{}} regs_up : out dma_axi_write_simple_regs_up_t; regs_down : in dma_axi_write_simple_regs_down_t; interrupt : out std_ulogic; --# {{}} axi_write_m2s : out axi_write_m2s_t; axi_write_s2m : in axi_write_s2m_t ); end component; Main implementation of the simple DMA functionality. This entity is not suitable for instantiation in a user design, use instead e.g. :ref:`dma_axi_write_simple.dma_axi_write_simple_axi_lite`. Packet length _____________ The ``packet_length_beats`` generic specifies the packet length in terms of number of input ``stream`` beats. When one packet of streaming data has been written to DDR, the ``write_done`` interrupt will trigger and the ``buffer_written_address`` register will be updated. This indicates to the software that there is data in the buffer that can be read. .. note:: The packet length is a compile-time parameter. It can not be changed during runtime and there is no support for writing or clearing partial packets. This saves a lot of resources and is part of the simple nature of this DMA core. If the packet length specified by the user equates more than one maximum-length AXI burst, the core will perform burst splitting internally. Data width conversion _____________________ The core supports data width conversion between the input ``stream`` and the AXI bus. If the ``stream_data_width`` and ``axi_data_width`` are not equal, :ref:`common.width_conversion` will be instantiated for a lightweight conversion. Note that ``axi_data_width`` must be the native width of the AXI port. .. _dma_axi_write_simple_resource_usage: Resource usage ______________ The core has a simple design with the goal of low resource utilization in mind. See :ref:`dma_axi_write_simple.dma_axi_write_simple_axi_lite.resource_utilization` for some build numbers. These numbers are incredibly low compared to some other implementations. The special case when ``packet_length_beats`` is 1 has an optimized implementation that gives even lower resource usage than the general case. This comes at the cost of quite poor memory performance, since every data beat becomes and AXI burst in that case. .. _dma_axi_write_simple_throughput: AXI/data throughput ___________________ The core has a one-cycle overhead per packet. Meaning that for each packet, the input ``stream`` will stall (``stream_ready = 0``) for one clock cycle. This is assuming that ``AWREADY`` and ``WREADY`` are high. If they are not, their stall will be propagated to the ``stream``. This performance should be enough for even the most demanding applications. The one-cycle overhead could theoretically be optimized away, but it is quite likely that downstream AXI interconnect infrastructure has some overhead for each address transaction anyway. I.e. the one-cycle overhead in this core is probably not limiting the throughput overall. If the memory buffer is full, the ``stream`` will stall until there is space. When the software writes an updated ``buffer_read_address`` register indicating available space, the ``stream`` will start after two clock cycles. AXI behavior ____________ The core is designed to be as well-behaved as possible in an AXI sense: 1. AXI bursts of the maximum length possible will be used. 2. The ``AW`` transaction is only initiated once we have at least one ``W`` beat available. 3. ``BREADY`` is always high. This gives very good AXI performance. W channel block ~~~~~~~~~~~~~~~ Related to bullet point 2 above, the core does NOT accumulate a whole burst in order to guarantee no holes in the data. Meaning, it is possible that an ``AW`` and a few ``W`` transactions happen, but then the ``stream`` might stop for a while and block the AXI bus before the burst is finished. This can be problematic if the downstream AXI slave is a crossbar/interconnect that arbitrates between multiple AXI masters. It is up to the user to make sure that either, 1. The ``stream`` stopping within a packet is rare enough that sufficient AXI performance is reached. 2. Or, the downstream AXI slave can handle holes without impacting performance. :ref:`axi.axi_write_throttle` is designed to help with this. AXI3 ~~~~ Setting the ``enable_axi3`` generic will make the core compliant with AXI3 instead of AXI4. The core does not use any of the ID fields (``AWID``, ``WID``, ``BID``) so the only difference is the burst length limitation. ``write_done`` interrupt aggregation ____________________________________ The ``write_done`` interrupt bit is triggered every time a packet has been written to memory (see :ref:`dma_axi_write_simple.register_interface`). If an interrupt-based control flow is used, this can lead to a high interrupt rate if packets are small but data rate is high. This can be problematic for the CPU. The generics ``write_done_aggregate_count`` and ``write_done_aggregate_ticks`` can be used to aggregate the interrupt event so it is triggered more sparsely. See :ref:`common.event_aggregator` for details. Commonly, both generics are set to ensure that the interrupt is triggered: 1. Not too often (too little data per interrupt, too high overhead in interrupt manager). 2. Not too seldom (too much data per interrupt, buffer might fill up and stream might stall). 3. Not too delayed (too high latency in data flow). .. _dma_axi_write_simple.dma_axi_write_simple_axi_lite: dma_axi_write_simple_axi_lite.vhd --------------------------------- `View source code on GitHub `__. .. symbolator:: component dma_axi_write_simple_axi_lite is generic ( address_width : axi_address_width_t; stream_data_width : axi_data_width_t; axi_data_width : axi_data_width_t; packet_length_beats : positive; enable_axi3 : boolean; write_done_aggregate_count : positive; write_done_aggregate_ticks : positive ); port ( clk : in std_ulogic; --# {{}} stream_ready : out std_ulogic; stream_valid : in std_ulogic; stream_data : in std_ulogic_vector; --# {{}} regs_m2s : in axi_lite_m2s_t; regs_s2m : out axi_lite_s2m_t; interrupt : out std_ulogic; --# {{}} axi_write_m2s : out axi_write_m2s_t; axi_write_s2m : in axi_write_s2m_t ); end component; Top level for the simple DMA module, with an **AXI-Lite** register interface. This top level is suitable for instantiation in a user design. It integrates :ref:`dma_axi_write_simple.dma_axi_write_simple` and an AXI-Lite register file. See :ref:`dma_axi_write_simple.dma_axi_write_simple` for more documentation. .. _dma_axi_write_simple.dma_axi_write_simple_axi_lite.resource_utilization: Resource utilization ____________________ This entity has `netlist builds `__ set up with `automatic size checkers `__ in `module_dma_axi_write_simple.py `__. The following table lists the resource utilization for the entity, depending on generic configuration. .. list-table:: Resource utilization for **dma_axi_write_simple_axi_lite** netlist builds. :header-rows: 1 * - Generics - Total LUTs - FFs - Maximum logic level * - address_width = 29 stream_data_width = 64 axi_data_width = 64 packet_length_beats = 1 - 156 - 207 - 16 * - address_width = 29 stream_data_width = 64 axi_data_width = 64 packet_length_beats = 16 - 157 - 226 - 12 * - address_width = 29 stream_data_width = 64 axi_data_width = 64 packet_length_beats = 2048 - 132 - 218 - 11 * - address_width = 29 stream_data_width = 64 axi_data_width = 64 packet_length_beats = 16384 - 134 - 218 - 10 * - address_width = 29 stream_data_width = 64 axi_data_width = 32 packet_length_beats = 16384 - 171 - 320 - 11 * - address_width = 29 stream_data_width = 64 axi_data_width = 128 packet_length_beats = 16384 - 198 - 410 - 11 * - address_width = 29 stream_data_width = 64 axi_data_width = 64 packet_length_beats = 1024 write_done_aggregate_count = 512 write_done_aggregate_ticks = 262144 - 156 - 247 - 12 .. _dma_axi_write_simple.dma_axi_write_simple_sim_pkg: dma_axi_write_simple_sim_pkg.vhd -------------------------------- `View source code on GitHub `__. Package with functions to simulate and check the DMA functionality.