Palmed Demo explained
Basic blocks
A basic block is a block of assembly, in GAS syntax on this page, that contains no branches. Any branching instruction send in the basic block input form will be ignored while plugging it into the model.
Supported instructions
For various reasons, our models are not supporting every single instruction in the ISA. When an inputed basic block contains an unsupported instruction, it is plainly ignored while computing the output — and a banner states this clearly. Plainly ignored means that the computation will proceed as if this instruction was not present in the output at all.
Here follows a list of unsupported instruction classes.
Unsupported on every ISA
- Control flow. Control flow instructions (jumps, calls, …) are not supported, since they are not part of basic blocks. However, this means that we also ignore the cost of eg. the conditional in a conditional jump, or the address computation.
Unsupported in x86-64
- Divisions. Divisions are not supported, since their throughput is not easily benchmarked. Indeed, the input and output registers are fixed, hence creating data dependencies interfering with throughput.
- Fences. Fences are by definition introducing behaviour that does not compose well with throughput measurements.
- Some cache control operations. For the same reason, some cache control operations are not easily benchmarked. This includes eg. CLFSH, PREFETCHW, …
- x87 instructions. The x87 extension set is not supported, since it is barely used nowadays and its microcode is generally sub-optimal on modern CPUs, where SSE is used instead. Additionnaly, composing x87 and SSE instructions in a single basic block induces a delay that makes model inferance difficult.
- AVXn, depending on the mapping. Not all of our mappings support AVX, AVX2 and AVX512. Depending on the mapping you use, some of these extensions might not be available.
Output
Throughput measurement
Our model is focused on throughput. This means that every insight infered on a basic block is computed assuming:
- No (limiting) data dependencies. No instruction depends on the result of a previous instruction. This is equivalent to claiming that every instruction B depending on the result of an instruction A is "far enough" from A so that it has enough time to complete before B is executed.
- An infinite loop. The basic block is assumed to be the body of a loop, executed a sufficient number of times so that it can be assumed to be in a steady-state.
Metrics
Cycles. The number of CPU cycles needed to execute the basic block once. This number might be fractional, since we assume we are in steady state (see above).
IPC (Instructions Per Cycle). The number of instructions executed per CPU cycle, on average over this basic block. This metric is relevant to HPC experts as it describes how busy the CPU is.
Resources
Each abstract resource used by the given basic block is shown on this part. Each resource can be used up to 1 per CPU cycle; thus, the Cycles metric for this basic block is the maximal use of a resource.
The bottleneck resource for a given basic block is the most used one; finding a way to alleviate the use of this resource will improve the runtime of this basic block.
The use of each resource is the sum of its use by each individual instruction. This detail can be seen by clicking on a resource in the output, and can be used to focus the efforts on the instructions that actually contribute significantly to a bottleneck.