Module dma_axi_write_simple

This document contains technical documentation for the dma_axi_write_simple module. This module has a register interface, so make sure to study the register interface documentation as well as this top-level document. To browse the source code, visit the repository on GitHub.

This module contains a simple Direct Memory Access (DMA) component for streaming data from FPGA to DDR memory over AXI. The implementation is optimized for

  1. very low resource usage, and

  2. maximum AXI/data throughput.

Being simplified, however, it has the following limitations:

  1. Can only handle writing to continuous ring buffer space in DDR. Has no scatter-gather capability.

  2. Does not support data strobing or narrow AXI transfers. All addresses must be aligned with the AXI data width.

  3. Uses a static compile-time packet length, with no support for partial packets.

  4. Packet length must be power of two.

These limitations and the simplicity of the design are intentional. This is what enables the low resource usage and high throughput.

C++ driver

There is a complete C++ driver available in the cpp sub-folder in the repository. The class provides a convenient API for setting up the module and receiving stream data from the FPGA. See the header file for documentation.

Simulate and build FPGA with register artifacts

This module is controlled over a register bus, with code generated by hdl-registers. See Register interface for register documentation.

Generated register code artifacts are not checked in to the repository. The recommended way to use hdl-modules is with tsfpga (see Getting started), in which case register code is always generated and kept up to date automatically. This is by far the most convenient and portable solution.

If you dont’t want to use tsfpga, you can integrate hdl-registers code generation in your build/simulation flow or use the hard coded artifacts below (not recommended).

Hard coded artifacts

Not recommended, but if you don’t want to use tsfpga or hdl-registers, these generated VHDL artifacts can be included in the dma_axi_write_simple library for simulation and synthesis:

  1. regs_src/dma_axi_write_simple_regs_pkg.vhd

  2. regs_src/dma_axi_write_simple_register_record_pkg.vhd

  3. regs_src/dma_axi_write_simple_reg_file.vhd

  4. regs_sim/dma_axi_write_simple_register_read_write_pkg.vhd

The first few are source files that shall be included in your simulation as well build project. The last one is a simulation file that shall be included only in your simulation project. These generated C++ artifacts can be used to control the module from software:

  1. include/i_dma_axi_write_simple.h

  2. include/dma_axi_write_simple.h

  3. dma_axi_write_simple.cpp

Warning

When copy-pasting generated artifacts, there is a large risk that things go out of sync when e.g. versions are bumped. An automated solution with tsfpga is highly recommended.

Register interface

This module is controlled and monitored over a register bus. Please see separate HTML page for register documentation. Register code is generated using hdl-registers based on the regs_dma_axi_write_simple.toml data file.

dma_axi_write_simple.vhd

View source code on GitHub.

component dma_axi_write_simple is
  generic (
    address_width : axi_addr_width_t;
    stream_data_width : axi_data_width_t;
    axi_data_width : axi_data_width_t;
    packet_length_beats : positive;
    enable_axi3 : boolean
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    stream_ready : out std_ulogic;
    stream_valid : in std_ulogic;
    stream_data : in std_ulogic_vector;
    --# {{}}
    regs_up : out dma_axi_write_simple_regs_up_t;
    regs_down : in dma_axi_write_simple_regs_down_t;
    interrupt : out std_ulogic;
    --# {{}}
    axi_write_m2s : out axi_write_m2s_t;
    axi_write_s2m : in axi_write_s2m_t
  );
end component;

Main implementation of the simple DMA functionality. This entity is not suitable for instantiation in a user design, use instead e.g. dma_axi_write_simple_axi_lite.vhd.

Packet length

The packet_length_beats generic specifies the packet length in terms of number of input stream beats. When one packet of streaming data has been written to DDR, the write_done interrupt will trigger and the buffer_written_address register will be updated. This indicates to the software that there is data in the buffer that can be read.

Note

The packet length is a compile-time parameter. It can not be changed during runtime and there is no support for writing or clearing partial packets.

This saves a lot of resources and is part of the simple nature of this DMA core.

If the packet length specified by the user equates more than one maximum-length AXI burst, the core will perform burst splitting internally.

Data width conversion

The core supports data width conversion between the input stream and the AXI bus. If the stream_data_width and axi_data_width are not equal, width_conversion.vhd will be instantiated for a lightweight conversion. Note that axi_data_width must be the native width of the AXI port.

Resource usage

The core has a simple design with the goal of low resource utilization in mind. See Resource utilization for some build numbers. These numbers are incredibly low compared to some other implementations.

The special case when packet_length_beats is 1 has an optimized implementation that gives even lower resource usage than the general case. This comes at the cost of quite poor memory performance, since every data beat becomes and AXI burst in that case.

AXI/data throughput

The core has a one-cycle overhead per packet. Meaning that for each packet, the input stream will stall (stream_ready = 0) for one clock cycle. This is assuming that AWREADY and WREADY are high. If they are not, their stall will be propagated to the stream.

This performance should be enough for even the most demanding applications. The one-cycle overhead could theoretically be optimized away, but it is quite likely that downstream AXI interconnect infrastructure has some overhead for each address transaction anyway. I.e. the one-cycle overhead in this core is probably not limiting the throughput overall.

If the memory buffer is full, the stream will stall until there is space. When the software writes an updated buffer_read_address register indicating available space, the stream will start after two clock cycles.

AXI behavior

The core is designed to be as well-behaved as possible in an AXI sense:

  1. AXI bursts of the maximum length possible will be used.

  2. The AW transaction is only initiated once we have at least one W beat available.

  3. BREADY is always high.

This gives very good AXI performance.

W channel block

Related to bullet point 2 above, the core does NOT accumulate a whole burst in order to guarantee no holes in the data. Meaning, it is possible that an AW and a few W transactions happen, but then the stream can stop for a while and block the AXI bus before the burst is finished.

This can be problematic if the downstream AXI slave is a crossbar/interconnect that arbitrates between multiple AXI masters.

It is up to the user to make sure that either,

  1. The stream never stops within a packet, so that optimal AXI performance is reached.

  2. Or, the downstream AXI slave can handle holes without impacting performance. axi_write_throttle.vhd is designed to help with this.

AXI3

Setting the enable_axi3 generic will make the core compliant with AXI3 instead of AXI4. The core does not use any of the ID fields (AWID, WID, BID) so the only difference is the burst length limitation.

dma_axi_write_simple_axi_lite.vhd

View source code on GitHub.

component dma_axi_write_simple_axi_lite is
  generic (
    address_width : axi_addr_width_t;
    stream_data_width : axi_data_width_t;
    axi_data_width : axi_data_width_t;
    packet_length_beats : positive;
    enable_axi3 : boolean
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    stream_ready : out std_ulogic;
    stream_valid : in std_ulogic;
    stream_data : in std_ulogic_vector;
    --# {{}}
    regs_m2s : in axi_lite_m2s_t;
    regs_s2m : out axi_lite_s2m_t;
    interrupt : out std_ulogic;
    --# {{}}
    axi_write_m2s : out axi_write_m2s_t;
    axi_write_s2m : in axi_write_s2m_t
  );
end component;

Top level for the simple DMA module, with an AXI-Lite register interface. This top level is suitable for instantiation in a user design. It integrates dma_axi_write_simple.vhd and an AXI-Lite register file.

See dma_axi_write_simple.vhd for more documentation.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_dma_axi_write_simple.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for dma_axi_write_simple_axi_lite netlist builds.

Generics

Total LUTs

FFs

Maximum logic level

address_width = 29

stream_data_width = 64

axi_data_width = 64

packet_length_beats = 1

156

207

16

address_width = 29

stream_data_width = 64

axi_data_width = 64

packet_length_beats = 16

157

226

12

address_width = 29

stream_data_width = 64

axi_data_width = 64

packet_length_beats = 2048

132

218

11

address_width = 29

stream_data_width = 64

axi_data_width = 64

packet_length_beats = 16384

134

218

10

address_width = 29

stream_data_width = 64

axi_data_width = 32

packet_length_beats = 16384

171

320

11

address_width = 29

stream_data_width = 64

axi_data_width = 128

packet_length_beats = 16384

198

410

11

dma_axi_write_simple_sim_pkg.vhd

View source code on GitHub.

Package with functions to simulate and check the DMA functionality.