Module axi

This document contains technical documentation for the axi module.

axi_address_fifo.vhd

View source code on gitlab.com.

component axi_address_fifo is
  generic (
    id_width : natural;
    addr_width : positive;
    asynchronous : boolean;
    depth : natural;
    ram_type : ram_style_t
  );
  port (
    clk : in std_ulogic;
    -- Only need to assign the clock if generic asynchronous is "True"
    clk_input : in std_ulogic;
    --# {{}}
    input_m2s : in axi_m2s_a_t;
    input_s2m : out axi_s2m_a_t;
    --# {{}}
    output_m2s : out axi_m2s_a_t;
    output_s2m : in axi_s2m_a_t
  );
end component;

FIFO for AXI address channel (AR or AW). Can be used as clock crossing by setting the asynchronous generic. By setting the width generics, the bus is packed optimally so that no unnecessary resources are consumed.

Note

If asynchronous operation is enabled, the constraints of asynchronous_fifo.vhd must be used.

axi_b_fifo.vhd

View source code on gitlab.com.

component axi_b_fifo is
  generic (
    id_width : natural;
    asynchronous : boolean;
    depth : natural;
    ram_type : ram_style_t
  );
  port (
    clk : in std_ulogic;
    -- Only need to assign the clock if generic asynchronous is "True"
    clk_input : in std_ulogic;
    --# {{}}
    input_m2s : in axi_m2s_b_t;
    input_s2m : out axi_s2m_b_t;
    --# {{}}
    output_m2s : out axi_m2s_b_t;
    output_s2m : in axi_s2m_b_t
  );
end component;

FIFO for AXI write response channel (B). Can be used as clock crossing by setting the asynchronous generic. By setting the id_width generic, the bus is packed optimally so that no unnecessary resources are consumed.

Note

If asynchronous operation is enabled, the constraints of asynchronous_fifo.vhd must be used.

axi_lite_cdc.vhd

View source code on gitlab.com.

component axi_lite_cdc is
  generic (
    data_width : positive;
    addr_width : positive;
    fifo_depth : positive;
    ram_type : ram_style_t
  );
  port (
    clk_master : in std_ulogic;
    master_m2s : in axi_lite_m2s_t;
    master_s2m : out axi_lite_s2m_t;
    --# {{}}
    clk_slave : in std_ulogic;
    slave_m2s : out axi_lite_m2s_t;
    slave_s2m : in axi_lite_s2m_t
  );
end component;

Clock domain crossing of a full AXI-Lite bus (read and write) using asynchronous FIFOs for the different channels. By setting the width generics, the bus is packed optimally so that no unnecessary resources are consumed.

Note

The constraints of asynchronous_fifo.vhd must be used.

axi_lite_mux.vhd

View source code on gitlab.com.

component axi_lite_mux is
  generic (
    slave_addrs : addr_and_mask_vec_t
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    axi_lite_m2s : in axi_lite_m2s_t;
    axi_lite_s2m : out axi_lite_s2m_t;
    --# {{}}
    axi_lite_m2s_vec : out axi_lite_m2s_vec_t;
    axi_lite_s2m_vec : in axi_lite_s2m_vec_t
  );
end component;

AXI-Lite mux, aka simple 1-to-N crossbar.

The slave_addrs generic is a list of base address configurations for the N slaves. Each entry consists of a base address, along with a mask that will be used to match the master address with a slave. Only the bits that are asserted in the mask are taken into account when matching.

If the address requested by the master does not match any slave, this entity will send AXI decode error on the response channel. There will still be proper AXI handshaking done, so the master will not be stalled.

axi_lite_pipeline.vhd

View source code on gitlab.com.

component axi_lite_pipeline is
  generic (
    data_width : positive;
    addr_width : positive;
    -- Settings to the handshake_pipeline blocks. These default settings (the same as
    -- handshake_pipeline's defaults) give full throughput and the lowest logic depth.
    -- They can be changed from default in order to decrease logic utilization.
    full_throughput : boolean;
    pipeline_control_signals : boolean
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    master_m2s : in axi_lite_m2s_t;
    master_s2m : out axi_lite_s2m_t;
    --# {{}}
    slave_m2s : out axi_lite_m2s_t;
    slave_s2m : in axi_lite_s2m_t
  );
end component;

Pipelining of a full AXI-Lite bus (read and write), with the goal of improving timing on the data and/or control signals.

The default settings will result in full skid-aside buffers, which pipeline both the data and control signals. However the generics to handshake_pipeline.vhd can be modified to get a simpler implementation that results in lower resource utilization.

axi_lite_pkg.vhd

View source code on gitlab.com.

Data types for working with AXI4-Lite interfaces. Based on the document “ARM IHI 0022E (ID022613): AMBA AXI and ACE Protocol Specification” Available here: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0022e/

axi_lite_simple_read_crossbar.vhd

View source code on gitlab.com.

component axi_lite_simple_read_crossbar is
  generic(
    num_inputs : positive
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    input_ports_m2s : in axi_lite_read_m2s_vec_t;
    input_ports_s2m : out axi_lite_read_s2m_vec_t;
    --# {{}}
    output_m2s : out axi_lite_read_m2s_t;
    output_s2m : in axi_lite_read_s2m_t
  );
end component;

Simple N-to-1 crossbar for connecting multiple AXI-Lite read masters to one port. This is a wrapper around axi_simple_read_crossbar.vhd. See that entity for details.

axi_lite_simple_write_crossbar.vhd

View source code on gitlab.com.

component axi_lite_simple_write_crossbar is
  generic(
    num_inputs : positive
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    input_ports_m2s : in axi_lite_write_m2s_vec_t;
    input_ports_s2m : out axi_lite_write_s2m_vec_t;
    --# {{}}
    output_m2s : out axi_lite_write_m2s_t;
    output_s2m : in axi_lite_write_s2m_t
  );
end component;

Simple N-to-1 crossbar for connecting multiple AXI-Lite write masters to one port. This is a wrapper around axi_simple_write_crossbar.vhd. See that entity for details.

axi_lite_to_vec.vhd

View source code on gitlab.com.

component axi_lite_to_vec is
  generic (
    axi_lite_slaves : addr_and_mask_vec_t;
    -- Set to false in order to insert a CDC for this slave.
    -- Must also set clk_axi_lite_vec.
    clocks_are_the_same : boolean_vector;
    cdc_fifo_depth : positive;
    cdc_ram_type : ram_style_t;
    -- Optionally insert a pipeline stage after the axi_lite_mux for each slave
    pipeline_slaves : boolean
  );
  port (
    --# {{}}
    clk_axi_lite : in std_ulogic;
    axi_lite_m2s : in axi_lite_m2s_t;
    axi_lite_s2m : out axi_lite_s2m_t;
    --# {{}}
    -- Only need to set if different from clk_axi_lite
    clk_axi_lite_vec : in std_ulogic_vector;
    axi_lite_m2s_vec : out axi_lite_m2s_vec_t;
    axi_lite_s2m_vec : in axi_lite_s2m_vec_t
  );
end component;

Convenience wrapper for splitting and CDC’ing a register bus based on generics. The goal is to split a register bus, and have each resulting AXI-Lite bus in the same clock domain as the module that uses the registers. Typically used in chip top levels.

Instantiates axi_lite_mux.vhd and axi_lite_cdc.vhd.

axi_pkg.vhd

View source code on gitlab.com.

Data types for working with AXI4 interfaces

Based on the document “ARM IHI 0022E (ID022613): AMBA AXI and ACE Protocol Specification”, available here: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0022e/index.html

axi_r_fifo.vhd

View source code on gitlab.com.

component axi_r_fifo is
  generic (
    asynchronous : boolean;
    id_width : natural;
    data_width : positive;
    depth : natural;
    enable_packet_mode : boolean;
    ram_type : ram_style_t
  );
  port (
    clk : in std_ulogic;
    -- Only need to assign the clock if generic asynchronous is "True"
    clk_input : in std_ulogic;
    --# {{}}
    input_m2s : in axi_m2s_r_t;
    input_s2m : out axi_s2m_r_t;
    --# {{}}
    output_m2s : out axi_m2s_r_t;
    output_s2m : in axi_s2m_r_t;
    -- Level of the FIFO. If this is an asynchronous FIFO, this value is on the "output" side.
    output_level : out integer range 0 to depth
  );
end component;

FIFO for AXI read response channel (R). Can be used as clock crossing by setting the asynchronous generic. By setting the width generics, the bus is packed optimally so that no unnecessary resources are consumed.

Note

If asynchronous operation is enabled, the constraints of asynchronous_fifo.vhd must be used.

axi_read_cdc.vhd

View source code on gitlab.com.

component axi_read_cdc is
  generic (
    id_width : natural;
    addr_width : positive;
    data_width : positive;
    enable_data_fifo_packet_mode : boolean;
    address_fifo_depth : positive;
    address_fifo_ram_type : ram_style_t;
    data_fifo_depth : positive;
    data_fifo_ram_type : ram_style_t
  );
  port (
    clk_input : in std_ulogic;
    input_m2s : in axi_read_m2s_t;
    input_s2m : out axi_read_s2m_t;
    --# {{}}
    clk_output : in std_ulogic;
    output_m2s : out axi_read_m2s_t;
    output_s2m : in axi_read_s2m_t;
    output_data_fifo_level : out integer range 0 to data_fifo_depth
  );
end component;

Clock domain crossing of a full AXI read bus using asynchronous FIFOs for the AR and R channels. By setting the width generics, the bus is packed optimally so that no unnecessary resources are consumed.

Note

The constraints of asynchronous_fifo.vhd must be used.

axi_read_pipeline.vhd

View source code on gitlab.com.

component axi_read_pipeline is
  generic (
    addr_width : positive;
    id_width : natural;
    data_width : positive;
    -- Can be changed from default in order to decrease logic utilization, at the cost of lower
    -- throughput. See handshake_pipeline for details.
    full_address_throughput : boolean;
    full_data_throughput : boolean
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    left_m2s : in axi_read_m2s_t;
    left_s2m : out axi_read_s2m_t;
    --# {{}}
    right_m2s : out axi_read_m2s_t;
    right_s2m : in axi_read_s2m_t
  );
end component;

Pipeline the AR and R channels of an AXI read bus. The generics can be used to control throughput settings, which affects the logic footprint.

axi_read_throttle.vhd

View source code on gitlab.com.

component axi_read_throttle is
  generic(
    data_fifo_depth : positive;
    max_burst_length_beats : positive;
    id_width : natural;
    addr_width : positive;
    -- The AR channel is pipelined one step to improve poor timing, mainly on ARVALID.
    -- If this generic is set to false, the pipelining will be of a simpler model that has lower
    -- logic footprint, but only allow a transaction every third clock cycle. If it is set to true,
    -- the pipeline will support a transaction every clock cycle, at the cost of a greater
    -- logic footprint.
    full_ar_throughput : boolean
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    data_fifo_level : in natural range 0 to data_fifo_depth;
    --# {{}}
    input_m2s : in axi_read_m2s_t;
    input_s2m : out axi_read_s2m_t;
    --# {{}}
    throttled_m2s : out axi_read_m2s_t;
    throttled_s2m : in axi_read_s2m_t
  );
end component;

Performs throttling of an AXI read bus by limiting the number of outstanding transactions, which makes the AXI master well behaved.

This entity is to be used in conjunction with a data FIFO on the input.r side. Using the level from that FIFO, the throttling will make sure that address transactions are not made that would result in the FIFO becoming full. This avoids stalling on the throttled_s2m.r channel.

To achieve this it keeps track of the number of outstanding beats that have been negotiated but not yet sent.

digraph my_graph {
graph [dpi = 300];
rankdir="LR";

ar [shape=none label="AR"];
r [shape=none label="R"];

{
  rank=same;
  ar;
  r;
}

r_fifo [label="" shape=none image="fifo.png"];
axi_read_throttle [shape=box label="AXI read\nthrottle"];
ar -> axi_read_throttle;

axi_slave [shape=box label="AXI slave" height=2];

axi_read_throttle -> axi_slave [label="AR"];
r_fifo -> axi_slave [dir="back"];
r -> r_fifo [dir="back"];
r_fifo:n -> axi_read_throttle:s [label="level"];
}

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_axi.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for axi_read_throttle.vhd netlist builds.

Generics

Total LUTs

FFs

Maximum logic level

data_fifo_depth = 1024

max_burst_length_beats = 256

id_width = 6

addr_width = 32

full_ar_throughput = False

full_aw_throughput = False

40

75

9

axi_simple_read_crossbar.vhd

View source code on gitlab.com.

component axi_simple_read_crossbar is
  generic(
    num_inputs : positive
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    input_ports_m2s : in axi_read_m2s_vec_t;
    input_ports_s2m : out axi_read_s2m_vec_t;
    --# {{}}
    output_m2s : out axi_read_m2s_t;
    output_s2m : in axi_read_s2m_t
  );
end component;

Simple N-to-1 crossbar for connecting multiple AXI read masters to one port.

Uses round-robin scheduling for the input_ports. It is simple in the sense that there is no separation of AXI AR and R channels with separate queues. After a port has been selected for address transaction, the crossbar is locked on that port until it has finished it’s read response transactions. After that the crossbar moves on to do a new address transaction on, possibly, another port.

Due to this it has a very small logic footprint but will never reach full utilization of the data channels. In order to get higher throughput, further address transactions should be queued up to the slave while a read response burst is running.

axi_simple_write_crossbar.vhd

View source code on gitlab.com.

component axi_simple_write_crossbar is
  generic(
    num_inputs : positive
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    input_ports_m2s : in axi_write_m2s_vec_t;
    input_ports_s2m : out axi_write_s2m_vec_t;
    --# {{}}
    output_m2s : out axi_write_m2s_t;
    output_s2m : in axi_write_s2m_t
  );
end component;

Simple N-to-1 crossbar for connecting multiple AXI write masters to one port.

Uses round-robin scheduling for the input_ports. It is simple in the sense that there is no separation of AXI AW/W/B channels with separate queues. After a port has been selected for address transaction, the crossbar is locked on that port until it has finished it’s write (W) transactions and write response (B) transaction. After that the crossbar moves on to do a new address transaction on, possibly, another port.

Due to this it has a very small logic footprint but will never reach full utilization of the data channels. In order to reach higher throughput there needs to be separation of the channels so that further AW transactions are queued up while other W and B transactions are running, and further W transactions are performed while waiting for other B transactions.

axi_stream_fifo.vhd

View source code on gitlab.com.

component axi_stream_fifo is
  generic (
    data_width : positive;
    user_width : natural;
    asynchronous : boolean;
    depth : positive;
    ram_type : ram_style_t
  );
  port (
    clk : in std_ulogic;
    -- Only need to assign the clock if generic asynchronous is "True"
    clk_output : in std_ulogic;
    --# {{}}
    input_m2s : in axi_stream_m2s_t;
    input_s2m : out axi_stream_s2m_t;
    --# {{}}
    output_m2s : out axi_stream_m2s_t;
    output_s2m : in axi_stream_s2m_t
  );
end component;

FIFO for AXI Stream. Can be used as clock crossing by setting the asynchronous generic. By setting the width generics, the bus is packed optimally so that no unnecessary resources are consumed.

Note

If asynchronous operation is enabled, the constraints of asynchronous_fifo.vhd must be used.

axi_stream_pkg.vhd

View source code on gitlab.com.

Data types for working with AXI4-Stream interfaces. Based on the document “ARM IHI 0051A (ID030610) AMBA 4 AXI4-Stream Protocol Specification” Available here: https://developer.arm.com/documentation/ihi0051/a/

axi_to_axi_lite.vhd

View source code on gitlab.com.

component axi_to_axi_lite is
  generic (
    data_width : positive
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    axi_m2s : in axi_m2s_t;
    axi_s2m : out axi_s2m_t;
    --# {{}}
    axi_lite_m2s : out axi_lite_m2s_t;
    axi_lite_s2m : in axi_lite_s2m_t
  );
end component;

Convert AXI transfers to AXI-Lite transfers.

This module does not handle conversion of non-well behaved AXI transfers. Burst length has to be one and size must be the width of the bus. If these conditions are not met, the read/write response will signal SLVERR.

This module will throttle the AXI bus so that there is never more that one outstanding transaction (read and write separate). While the AXI-Lite standard does allow for outstanding bursts, some Xilinx cores (namely the PCIe DMA bridge) do not play well with it.

axi_to_axi_lite_vec.vhd

View source code on gitlab.com.

component axi_to_axi_lite_vec is
  generic (
    axi_lite_slaves : addr_and_mask_vec_t;
    -- Set to false in order to insert a CDC for this slave.
    -- Must also set clk_axi_lite_vec.
    clocks_are_the_same : boolean_vector;
    -- Optionally insert a pipeline stage on the AXI-Lite bus after the AXI to AXI-Lite conversion
    pipeline_axi_lite : boolean;
    -- Optionally insert a pipeline stage after the axi_lite_mux for each slave
    pipeline_slaves : boolean
  );
  port (
    clk_axi : in std_ulogic;
    axi_m2s : in axi_m2s_t;
    axi_s2m : out axi_s2m_t;
    --# {{}}
    -- Only need to set if different from axi_clk
    clk_axi_lite_vec : in std_ulogic_vector;
    axi_lite_m2s_vec : out axi_lite_m2s_vec_t;
    axi_lite_s2m_vec : in axi_lite_s2m_vec_t
  );
end component;

Convenience wrapper for converting a AXI bus to AXI-Lite, and then splitting and CDC’ing a register bus. The goal is to split a register bus, and have each resulting AXI-Lite bus in the same clock domain as the module that uses the registers. Typically used in chip top levels.

Instantiates axi_to_axi_lite.vhd, axi_lite_mux.vhd and axi_lite_cdc.vhd.

axi_w_fifo.vhd

View source code on gitlab.com.

component axi_w_fifo is
  generic (
    asynchronous : boolean;
    data_width : positive;
    depth : natural;
    enable_packet_mode : boolean;
    ram_type : ram_style_t
  );
  port (
    clk : in std_ulogic;
    -- Only needs to assign the clock if generic asynchronous is "True"
    clk_input : in std_ulogic;
    --# {{}}
    input_m2s : in axi_m2s_w_t;
    input_s2m : out axi_s2m_w_t;
    --# {{}}
    output_m2s : out axi_m2s_w_t;
    output_s2m : in axi_s2m_w_t
  );
end component;

FIFO for AXI write data channel (W). Can be used as clock crossing by setting the asynchronous generic. By setting the data_width generic, the bus is packed optimally so that no unnecessary resources are consumed.

Note

If asynchronous operation is enabled, the constraints of asynchronous_fifo.vhd must be used.

axi_write_cdc.vhd

View source code on gitlab.com.

component axi_write_cdc is
  generic (
    id_width : natural;
    addr_width : positive;
    data_width : positive;
    enable_data_fifo_packet_mode : boolean;
    address_fifo_depth : positive;
    address_fifo_ram_type : ram_style_t;
    data_fifo_depth : positive;
    data_fifo_ram_type : ram_style_t;
    response_fifo_depth : positive;
    response_fifo_ram_type : ram_style_t
  );
  port (
    clk_input : in std_ulogic;
    input_m2s : in axi_write_m2s_t;
    input_s2m : out axi_write_s2m_t;
    --# {{}}
    clk_output : in std_ulogic;
    output_m2s : out axi_write_m2s_t;
    output_s2m : in axi_write_s2m_t
  );
end component;

Clock domain crossing of a full AXI write bus using asynchronous FIFOs for the AW, W and B channels. By setting the width generics, the bus is packed optimally so that no unnecessary resources are consumed.

Note

The constraints of asynchronous_fifo.vhd must be used.

axi_write_pipeline.vhd

View source code on gitlab.com.

component axi_write_pipeline is
  generic (
    addr_width : positive;
    id_width : natural;
    data_width : positive;
    -- Can be changed from default in order to decrease logic utilization, at the cost of lower
    -- throughput. See handshake_pipeline for details.
    full_address_throughput : boolean;
    full_data_throughput : boolean
  );
  port (
    clk : in std_ulogic;
    --# {{}}
    left_m2s : in axi_write_m2s_t;
    left_s2m : out axi_write_s2m_t;
    --# {{}}
    right_m2s : out axi_write_m2s_t;
    right_s2m : in axi_write_s2m_t
  );
end component;

Pipeline the AW, W and B channels of an AXI write bus. The generics can be used to control throughput settings, which affects the logic footprint.

axi_write_throttle.vhd

View source code on gitlab.com.

component axi_write_throttle is
  port (
    clk : in std_ulogic;
    --# {{}}
    input_m2s : in axi_write_m2s_t;
    input_s2m : out axi_write_s2m_t;
    --# {{}}
    throttled_m2s : out axi_write_m2s_t;
    throttled_s2m : in axi_write_s2m_t
  );
end component;

Performs throttling of an AXI write bus with the goal of making the AXI write master well behaved. This entity makes sure that AWVALID is asserted in the same clock cycle as the first WVALID of the corresponding data burst.

This, along with the two conditions below, realize the most strict condition imaginable for an AXI write master interface being well behaved. It guarantees that not a single clock cycle is wasted on the throttled interface.

  1. Should be used in conjunction with a data FIFO on the input.w side that has packet mode enabled. This ensures that once WVALID has been asserted, it remains high until the WLAST transaction has occurred.

  2. The input.b.ready signal should be statically '1'. This ensures that B master on the throttled side is never stalled.

The imagined use case for this entity is with an AXI crossbar where the throughput should not be limited by one port starving out the others by being ill behaved. In this case it makes sense to use this throttler on each port.

However if a crossbar is not used, and the AXI bus goes directly to an AXI slave that has FIFOs on the AW and W channels, then there is no point to using this throttler. These FIFOs can be either in logic (in e.g. an AXI DDR4 controller) or in the “hard” AXI controller in e.g. a Xilinx Zynq device.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_axi.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for axi_write_throttle.vhd netlist builds.

Generics

Total LUTs

FFs

Maximum logic level

5

2

2