Module common

This document contains technical documentation for the common module.

addr_pkg.vhd

Collection of types/functions for working with address decode/matching.

attribute_pkg.vhd

Commonly used attributes for Xilinx Vivado, with description and their valid values. Information gathered from documents UG901 and UG912.

axi_stream_protocol_checker.vhd

component axi_stream_protocol_checker is
  generic (
    -- Assign a non-zero value in order to use the 'data'/'strobe' ports for protocol checking
    data_width : natural;
    -- Assign a non-zero value in order to use the 'id' port for protocol checking
    id_width : natural;
    logger_name_suffix : string;
    -- This can be used to essentially disable the
    --   "rule 4: Check failed for performance - tready active N clock cycles after tvalid."
    -- warning by setting a very high value for the limit.
    -- This warning is considered noise in most testbenches that exercise backpressure.
    -- Set to a lower value in order the enable the warning.
    rule_4_performance_check_max_waits : natural
  );
  port (
    clk : in std_logic;
    --
    ready : in std_logic;
    valid : in std_logic;
    -- Optional to connect.
    last : in std_logic;
    -- Optional to connect.
    -- Must set a valid 'id_width' generic value in order to use these.
    id : in std_logic_vector;
    -- Optional to connect.
    -- Must set a valid 'data_width' generic value in order to use these.
    data : in std_logic_vector;
    strobe : in std_logic_vector
  );
end component;

A wrapper around the VUnit AXI-Stream protocol checker. Has simpler interface, and can hence be included in synthesizable code with a generate guard:

if in_simulation generate

  axi_stream_protocol_checker_inst : common.axi_stream_protocol_checker
    generic map (
      ...
    );

end generate;

Without the generate guard, synthesis will fail. The file is placed in the “sim” folder, so it will not be included in synthesis projects by default when using tsfpga.

clock_counter.vhd

component clock_counter is
  generic (
    resolution_bits : positive;
    max_relation_bits : positive;
    -- The shift register length is device specific.
    -- For Xilinx Ultrascale and 7 series devices, it should be set to 32
    shift_register_length : integer
  );
  port (
    target_clock : in std_logic;
    --# {{}}
    reference_clock : in std_logic;
    target_tick_count : out unsigned
  );
end component;

Measure the switching rate of an unknown clock by using a free-running reference clock of known frequency.

Note

This entity instantiates a resync_counter.vhd block. See documentation of that entity for constraining details.

The frequency of target_clock is given by

target_tick_count * reference_clock_frequency / 2 ** resolution_bits

The target_tick_count value is updated every 2 ** resolution_bits cycles. It is invalid for 2 * 2 ** resolution_bits cycles in the beginning as reference_clock starts switching, but after that it is always valid.

For the calculation to work, target_clock must be no more than 2 ** (max_relation_bits - 1) times faster than reference_clock.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_common.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for clock_counter.vhd netlist builds.

Generics

Total LUTs

SRLs

FFs

resolution_bits = 24

max_relation_bits = 6

84

5

185

resolution_bits = 10

max_relation_bits = 4

38

2

86

common_pkg.vhd

Package with common features that do not fit in anywhere else, and are not significant enough to warrant their own package.

debounce.vhd

component debounce is
  generic (
    -- Number of cycles the input must be stable for the value to propagate to the result side.
    stable_count : positive
  );
  port (
    -- Input value that may be metastable and noisy
    noisy_input : in std_logic;
    --# {{}}
    clk : in std_logic;
    stable_result : out std_logic
  );
end component;

Simple debounce mechanism to be used with e.g. the signal from a button or dip switch. It eliminates noise and metastability by requiring the input to have a stable value for a specified number of clock cycles before propagating the value.

Note

This entity instantiates a resync_level.vhd block (async_reg chain) to make sure the input is not metastable. The resync_level.vhd has a scoped constraint file that must be used.

handshake_mux.vhd

component handshake_mux is
  generic (
    num_inputs : positive;
    data_width : positive
  );
  port (
    clk : in std_logic;
    --# {{}}
    input_ready : out std_logic_vector;
    input_valid : in std_logic_vector;
    input_last : in std_logic_vector;
    input_data : in slv_vec_t;
    input_strobe : in slv_vec_t;
    --# {{}}
    result_ready : in std_logic;
    result_valid : out std_logic;
    result_last : out std_logic;
    result_data : out std_logic_vector;
    result_strobe : out std_logic_vector;
    -- The input port index where the packet originated
    result_id : out natural range 0 to num_inputs - 1
  );
end component;

Multiplex between many AXI-Stream-like inputs towards one output bus. Will lock onto one input and let its data through until a packet has passed, as indicated by the last signal.

The implementation is simple, which comes with a few limitations:

Warning

If there are holes in an input packet stream after valid has been asserted, this multiplexer will be unnecessarily stalled even if another input has data available. It is up to the user to make sure that this does not occur, using e.g. a fifo.vhd in packet mode, or calculate that the system throughput is still sufficient.

The arbitration is done in the most resource-efficient round-robin manner possible, which means that one input can starve out the others if it continuously sends data.

handshake_pipeline.vhd

component handshake_pipeline is
  generic (
    data_width : natural;
    -- Setting to false can save logic footprint, at the cost of lower throughput
    full_throughput : boolean;
    -- Ensures that there is no combinatorial path between valid and ready on input and output.
    -- Will result in higher logic footprint.
    pipeline_control_signals : boolean;
    -- Ensures that there is no combinatorial path from data, strobe and last on input to output.
    -- Will result in higher logic footprint.
    pipeline_data_signals : boolean;
    -- In the typical use case where we want a "byte strobe", this would be eight.
    -- In other cases, for example when the data is packed, we might use a higher value.
    -- Must assign a valid value if input/output strobe is to be used.
    strobe_unit_width : positive
  );
  port (
    clk : in std_logic;
    --# {{}}
    input_ready : out std_logic;
    input_valid : in std_logic;
    -- Optional to connect.
    input_last : in std_logic;
    input_data : in std_logic_vector;
    -- Optional to connect. Must set valid 'strobe_unit_width' generic value in order to use this.
    input_strobe : in std_logic_vector;
    --# {{}}
    output_ready : in std_logic;
    output_valid : out std_logic;
    output_last : out std_logic;
    output_data : out std_logic_vector;
    output_strobe : out std_logic_vector
  );
end component;

Handshake pipeline. Is used to ease the timing of a streaming data interface by inserting register stages on the data and/or control signals.

There are many modes available, with different characteristics, that are enabled with different combinations of full_throughput, pipeline_control_signals and pipeline_data_signals. See the descriptions within the code for more details about throughput and fanout.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_common.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for handshake_pipeline.vhd netlist builds.

Generics

Total LUTs

FFs

Maximum logic level

data_width = 32

full_throughput = True

pipeline_control_signals = True

pipeline_data_signals = True

41

78

2

data_width = 32

full_throughput = True

pipeline_control_signals = False

pipeline_data_signals = True

1

38

2

data_width = 32

full_throughput = True

pipeline_control_signals = False

pipeline_data_signals = False

0

0

0

data_width = 32

full_throughput = False

pipeline_control_signals = True

pipeline_data_signals = True

1

39

2

data_width = 32

full_throughput = False

pipeline_control_signals = True

pipeline_data_signals = False

2

3

2

data_width = 32

full_throughput = False

pipeline_control_signals = False

pipeline_data_signals = True

2

38

2

data_width = 32

full_throughput = False

pipeline_control_signals = False

pipeline_data_signals = False

0

0

0

handshake_splitter.vhd

component handshake_splitter is
  generic (
    num_interfaces : positive
  );
  port (
    clk : in std_logic;
    --# {{}}
    input_ready : out std_logic;
    input_valid : in std_logic;
    --# {{}}
    output_ready : in std_logic_vector;
    output_valid : out std_logic_vector
  );
end component;

Combinatorially split an AXI-Stream-like handshaking interface, for cases where many slaves are to receive the data. Maintains full throughput and is AXI-stream compliant in its handling of the handshake signals (valid does not wait for ready, valid does not fall unless a transaction has occurred).

This entity has no pipelining of the handshake signals, but instead connects them combinatorially. This increases the logic depth for handshake signals where this entity is used. If timing issues occur (on the input or one of the output s) a handshake_pipeline.vhd instance can be used.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_common.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for handshake_splitter.vhd netlist builds.

Generics

Total LUTs

FFs

num_interfaces = 2

4

2

num_interfaces = 4

9

4

keep_remover.vhd

component keep_remover is
  generic (
    data_width : positive;
    strobe_unit_width : positive
  );
  port (
    clk : in std_logic;
    --# {{}}
    input_ready : out std_logic;
    input_valid : in std_logic;
    input_last : in std_logic;
    input_data : in std_logic_vector;
    input_keep : in std_logic_vector;
    --# {{}}
    output_ready : in std_logic;
    output_valid : out std_logic;
    output_last : out std_logic;
    output_data : out std_logic_vector;
    output_strobe : out std_logic_vector
  );
end component;

This entity removes strobe’d out lanes from the input, resulting in an output stream where all lanes are always strobed (except for the last beat, potentially). The strobe on input can be considered as the TKEEP signal in AXI-Stream terminology, and the output strobe would be TKEEP/TSTRB.

The entity works by continuously filling up a data buffer with data from the input. Only the lanes that are strobed will be saved to the buffer. Note that input words may have all their lanes strobed out (except for the last beat, see below). When enough lanes are saved to fill a whole word, data is passed to the output by asserting output_valid. When input_last is asserted for an input word, an output word will be sent out, with output_last asserted, even if a whole strobed word is not currently filled in the buffer.

The strobe unit data width is configurable via a generic. Most of the time it would be eight, i.e. a byte strobe. But in some cases the strobe represents a wider quanta, in which case the generic can be increased. Increasing the generic will drastically decrease the resource utilization, since that is the “atom” of data that is handled internally.

The handling of input_last presents a corner case. Lets assume that data_width is 16 and strobe_unit_width is 8. Furthermore, there is one atom of data available in the buffer, and input stream has both lanes strobed. In this case, one input word shall result in two output words. The first output word comes from a whole word being filled in the buffer. The second word comes from a half filled word in the buffer, but input_last being asserted. This is solved by having a small state machine that pads input data with an extra word when this corner case arises. The padding stage makes it possible to have a very simple data buffer stage, with low resource utilization.

Throughput

The entity achieves full throughput, except for the corner case mentioned above, where it might stall one cycle on the input.

Limitations

  • input_last may not be asserted on an input word that has all lanes strobed out.

  • There may never be a ‘1’ above a ‘0’ in the input strobe. E.g. “0111” is allowed, but “1100” is not.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_common.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for keep_remover.vhd netlist builds.

Generics

Total LUTs

FFs

Maximum logic level

DSP Blocks

data_width = 32

strobe_unit_width = 16

88

79

3

0

data_width = 64

strobe_unit_width = 8

410

175

6

0

data_width = 128

strobe_unit_width = 32

414

282

5

0

periodic_pulser.vhd

component periodic_pulser is
  generic (
    -- The period between pulses
    period : positive range 2 to integer'high;
    -- The shift register length is device specific.
    -- For Xilinx UltraScale and 7 series devices, it should be set to 33.
    shift_register_length : positive
  );
  port (
    clk : in std_logic;
    --# {{}}
    count_enable : in std_logic;
    pulse : out std_logic
  );
end component;

Outputs a one cycle pulse after a generic number of assertions of count_enable.

Shift registers are used as far as possible to create the pulse. This makes the implementation resource efficient on devices with cheap shift registers (such as SRLs in Xilinx devices). In the worst case a single counter is created.

The period is broken down into factors that are represented using shift registers, with the shift register length being the factor value. By rotating the shift register on each count_enable, a fixed period is created. When possible, multiple shift registers are AND-gated to create a longer period. For example a period of 30 can be achieved by gating two registers of length 10 and 3. This method only works if the lengths are mutual primes (i.e. the greatest common divisor is 1).

If the remaining factor is not 1 after the shift registers have been added, a new instance of this module is added through recursion.

If period cannot be factorized into one or more shift registers, recursion ends with either a simple counter or a longer shift register (depending on the size of the factor).

Example

Let’s say that the maximum shift register length is 16. A period of 510 = 10 * 3 * 17 can then be achieved using two shift registers of length 10 and 3, and then instantiating a new periodic_pulser.vhd

[0][0][0][0][0][0][0][0][0][1]
                              \
                               [AND] -> pulse -> [periodic_pulser with period 17]
                              /
                     [0][0][1]

The next stage will create a counter, because 17 is a prime larger than the maximum shift register length.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_common.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for periodic_pulser.vhd netlist builds.

Generics

Total LUTs

SRLs

FFs

period = 33

shift_register_length = 33

2

1

1

period = 33

shift_register_length = 1

6

0

6

period = 37

shift_register_length = 33

3

2

1

period = 37

shift_register_length = 1

6

0

6

period = 100

shift_register_length = 33

3

2

2

period = 100

shift_register_length = 1

7

0

7

period = 125

shift_register_length = 33

4

2

2

period = 125

shift_register_length = 1

7

0

7

period = 127

shift_register_length = 33

5

4

1

period = 127

shift_register_length = 1

7

0

7

period = 4625

shift_register_length = 33

7

4

3

period = 4625

shift_register_length = 1

4

0

13

period = 311000000

shift_register_length = 33

17

4

15

period = 311000000

shift_register_length = 1

7

0

29

strobe_on_last.vhd

component strobe_on_last is
  generic (
    data_width : positive
  );
  port (
    clk : in std_logic;
    --# {{}}
    input_ready : out std_logic;
    input_valid : in std_logic;
    input_last : in std_logic;
    input_data : in std_logic_vector;
    input_strobe : in std_logic_vector;
    --# {{}}
    output_ready : in std_logic;
    output_valid : out std_logic;
    output_last : out std_logic;
    output_data : out std_logic_vector;
    output_strobe : out std_logic_vector
  );
end component;

The goal of this entity is to process an AXI-Stream so that packets where last is asserted on a word that is completely strobed out are modified so that last is instead asserted on the last word which does have a strobe.

As a consequence of this, all words in the stream that are completely strobed out are dropped by this entity.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_common.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for strobe_on_last.vhd netlist builds.

Generics

Total LUTs

FFs

Maximum logic level

data_width = 8

7

12

3

data_width = 32

8

39

3

data_width = 64

9

75

3

types_pkg.vhd

Some basic types that make it easier to work with VHDL. Also some basic functions operating on these types.

width_conversion.vhd

component width_conversion is
  generic (
    input_width : positive;
    output_width : positive;
    -- Enable usage of the 'input_last' and 'output_last' ports.
    -- Will increase the logic footprint.
    enable_last : boolean;
    -- Enable usage of the 'input_strobe' and 'output_strobe' ports.
    -- Will increase the logic footprint.
    enable_strobe : boolean;
    -- In the typical use case where we want a "byte strobe", this would be eight.
    -- In other cases, for example when the data is packed, we might use a higher value.
    -- Must assign a valid value if 'enable_strobe' is true.
    strobe_unit_width : positive;
    -- Enable if 'input' packet lengths are not a multiple of the 'output' width.
    -- Must set 'enable_strobe' and 'enable_last' as well to use this.
    -- See header for details about how this works.
    -- Will increase the logic footprint.
    support_unaligned_packet_length : boolean
  );
  port (
    clk : in std_logic;
    --# {{}}
    input_ready : out std_logic;
    input_valid : in std_logic;
    -- Optional packet 'last' signalling. Must set 'enable_last' generic in order to use this.
    input_last : in std_logic;
    input_data : in std_logic_vector;
    -- Optional word strobe. Must set 'enable_strobe' generic in order to use this.
    input_strobe : in std_logic_vector;
    --# {{}}
    output_ready : in std_logic;
    output_valid : out std_logic;
    -- Optional packet 'last' signalling. Must set 'enable_last' generic in order to use this.
    output_last : out std_logic;
    output_data : out std_logic_vector;
    -- Optional word strobe. Must set 'enable_strobe' generic in order to use this.
    output_strobe : out std_logic_vector
  );
end component;

Width conversion of an AXI-Stream-like data bus. Can handle downconversion (wide to thin) or upconversion (thin to wide). The data widths must be a power-of-two multiple of each other. E.g. 10->40 is supported while 8->24 is not.

There is a generic to enable usage of the last signal. The last indicator will be passed along with the data from the input to output side as-is. Note that enabling the support_unaligned_packet_length generic will enable further processing of last, but in barebone configuration the signal is merely passed on.

There is a generic to enable strobing of data. The strobe will be passed on from input to output side as-is. Note that enabling support_unaligned_packet_length generic will enable further processing of strobe, but in barebone configuration the signal is merely passed on. This means, for example, that there might be output words where all strobe lanes are zero when downconverting.

There are some limitations, and possible remedies, concerning packet length alignment, depending on if we are doing upconversion or downconversion. See below.

Downconversion

When doing downconversion, one input beat will result in two or more output beats, depending on width configuration. This means that the output packet length is always aligned with the input data width. This is not always desirable when working with the strobe and last signals. Say for example that we are converting a bus from 16 to 8, and input_last is asserted on a beat where the lowest byte is strobed but the highest is not. In this case, we would want output_last to be asserted on the second to last byte, and the last byte (which is strobed out) to be removed. This is achieved by enabling the support_unaligned_packet_length generic. If the generic is not set, output_last will be asserted on the very last byte, which will be strobed out.

Upconversion

When upconverting, two or more input beats result in one output beat, depending on width configuration. This means that the input packet length must be aligned with the output data width, so that each packet fills up a whole number of output words. If this can not be guaranteed, then the support_unaligned_packet_length mode can be used. When that is enabled, the input stream will be padded upon last indication so that a whole output word is filled. Consider the example of converting a bus from 8 to 16, and input last is asserted on the third input beat. If support_unaligned_packet_length is disabled, there will be one output beat sent and half an output beat left in the converter. If the mode is enabled however, the input stream will be padded with another byte so that an output beat can be sent. The padded parts will have strobe set to zero.

Note that the handling of unaligned packet lengths is highly dependent on the input stream being well behaved. Specifically

  1. There may never be input beats where input_strobe is all zeros.

  2. For all beats except the one where input_last is asserted, input_strobe must be asserted on all lanes.

Resource utilization

This entity has netlist builds set up with automatic size checkers in module_common.py. The following table lists the resource utilization for the entity, depending on generic configuration.

Resource utilization for width_conversion.vhd netlist builds.

Generics

Total LUTs

FFs

Maximum logic level

input_width = 32

output_width = 16

enable_strobe = False

enable_last = False

support_unaligned_packet_length = False

20

51

2

input_width = 32

output_width = 16

enable_strobe = True

enable_last = True

support_unaligned_packet_length = False

23

59

2

input_width = 32

output_width = 16

enable_strobe = True

enable_last = True

support_unaligned_packet_length = True

29

63

3

input_width = 16

output_width = 32

enable_strobe = False

enable_last = False

support_unaligned_packet_length = False

35

51

2

input_width = 16

output_width = 32

enable_strobe = True

enable_last = True

support_unaligned_packet_length = False

40

59

2

input_width = 16

output_width = 32

enable_strobe = True

enable_last = True

support_unaligned_packet_length = True

44

62

2