Experienced developers have some intuition how a hardware description maps to circuits, enabling them to optimize code for circuit area or speed instinctively. But like most of the time, checking is better than (educated) guessing.
When contributing to the D standard library, I learned that the D community has quite high standards on code quality. This includes complete test coverage, extensive code and documentation reviews and runtime performance. However, when it comes to performance, the guiding principle was to write idiomatic code first and avoid premature optimization. If you write unidiomatic code as optimization, don’t just assume that it will perform better, but prove it using benchmarks. The idea behind that was the compilers are often much better at optimizing than we might first think.
Similar considerations apply for hardware development:
Larger gains are usually made through architectural choices anyway.
But whether it is architectural changes or micro-optimizations, we better have a way to measure the results.
Completely running the design through a RTL2GDS flow or implementing a bitstream is a possible solution, but it is often too slow for rapid application development approaches.
For a simple, quick estimate, synthesis results with technology mapping are sufficient.
Those can be obtained in only a few seconds using yosys
.
Basic Setup
As always, I like to use a simple Makefile to combine all used commands. Here’s the basic template, which just sets up folders, source files etc.:
export TOP_MODULE = neosd
APP_SVERILOG = neosd_cmd_reg.sv \
neosd_top.sv
######################################################################
# Generated variables
######################################################################
export OBJDIR=build
HDLDIR=src/hdl
export SYN_SVERILOG_PATHS=$(addprefix $(HDLDIR)/,$(APP_SVERILOG))
QUIET_FLAG=
ifeq ($(strip $(VERBOSE)),)
QUIET_FLAG=-q
endif
######################################################################
# Rules
######################################################################
clean:
rm -rf $(OBJDIR)
$(OBJDIR):
mkdir -p $(OBJDIR)
This sets up the TOP_MODULE
variable to contain the name of the module we want to synthesize.
Furthermore, APP_SVERILOG
contains a list of sources, whereas the full paths to those will be stored in SYN_SVERILOG_PATHS
.
Note that some variables are export
ed.
Those are available to external tools, as we will need to access them in our synthesis scripts.
As for make rules, there are two basic ones:
The clean
rule cleans up all generated files.
The $(OBJDIR)
rule is used to create the build directory, as the clean
rule deletes it completely.
This way we also don’t have to check in empty build directories into git.
Estimating FPGA Size
Synthesizing for FPGAs is quite simple and was mostly covered in this previous post. Here’s the complete Makefile rule to synthesize for the Gowin FPGA targets:
$(OBJDIR)/gowin.syn.json: $(SYN_SVERILOG_PATHS) | $(OBJDIR)
yosys -p "read_verilog -sv $(SYN_SVERILOG_PATHS); synth_gowin -top $(TOP_MODULE) -json $@; tee -o $(OBJDIR)/gowin.syn.stat stat" $(QUIET_FLAG) -l $(OBJDIR)/gowin.syn.log
One thing to note is that we use -noflatten
here, to obtain hierarchical results.
While we’re at it, let’s also add a synth
rule that synthesizes all targets and a summary
rule to directly print the hardware resource usage:
synth: $(OBJDIR)/gowin.syn.json
summary:
@echo
@echo ========================== FPGA Summary ==========================
@echo
@cat $(OBJDIR)/gowin.syn.stat
Now we can just run make synth
in any distrobox with yosys installed and get these results:
========================== FPGA Summary ==========================
4. Printing statistics.
=== neosd ===
Number of wires: 739
Number of wire bits: 1471
Number of public wires: 739
Number of public wire bits: 1471
Number of ports: 19
Number of port bits: 122
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 1163
ALU 8
DFFC 2
DFFCE 41
DFFE 209
DFFR 33
GND 1
IBUF 83
LUT1 209
LUT2 58
LUT3 46
LUT4 195
MUX2_LUT5 153
MUX2_LUT6 50
MUX2_LUT7 24
MUX2_LUT8 10
OBUF 39
VCC 1
sreg 1
=== sreg ===
Number of wires: 22
Number of wire bits: 66
Number of public wires: 22
Number of public wire bits: 66
Number of ports: 8
Number of port bits: 22
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 39
DFFCE 8
IBUF 13
LUT1 1
LUT3 8
OBUF 9
=== design hierarchy ===
neosd 1
sreg 1
Number of wires: 761
Number of wire bits: 1537
Number of public wires: 761
Number of public wire bits: 1537
Number of ports: 27
Number of port bits: 144
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 1201
ALU 8
DFFC 2
DFFCE 49
DFFE 209
DFFR 33
GND 1
IBUF 96
LUT1 210
LUT2 58
LUT3 54
LUT4 195
MUX2_LUT5 153
MUX2_LUT6 50
MUX2_LUT7 24
MUX2_LUT8 10
OBUF 48
VCC 1
Estimating ASIC Size
Let’s do the same thing for an ASIC target.
Unfortunately, things are a bit more complex for ASICs.
Let’s first extend the Makefile synth
and summary
rules:
synth: $(OBJDIR)/gowin.syn.json $(OBJDIR)/ihp.syn.json summary
summary:
@echo
@echo ========================== FPGA Summary ==========================
@echo
@cat $(OBJDIR)/gowin.syn.stat
@echo
@echo ========================== IHP Summary ==========================
@echo
@cat $(OBJDIR)/ihp.syn.stat
Then let’s add the new synthesis rule.
We’re going to use the IHP SG13G2 BiCMOS SiGe PDK here.
As we’ll need to use some PDK related files, we need to export the FLOW_HOME
variable and make it point to a copy of the OpenROAD Flow Scripts (ORFS):
export FLOW_HOME=/home/jpfau/Dokumente/orfs/flow
$(OBJDIR)/ihp.syn.json: $(SYN_SVERILOG_PATHS) | $(OBJDIR)
yosys syn_ihp.tcl $(QUIET_FLAG) -l $(OBJDIR)/ihp.syn.log
As the ASIC synthesis script is much more complex, we won’t provide it inline using the -p
option and will save it to syn_ihp.tcl
instead.
My synthesis script is a stripped down version of the one shipped in ORFS.
Some steps have been removed, so the synthesis will be less optimal compared to putting your HDL through the full ORFS flow. On the other hand, this simplification makes the script easier to maintain and understand. The script also hard-codes some information for the IHP PDK, so it can not easily be used with other PDKs. Here’s the full synthesis script:
# Import yosys commands
yosys -import
# PDK setup
set pdk_platform_dir $::env(FLOW_HOME)/platforms/ihp-sg13g2
set pdk_scripts_dir $::env(FLOW_HOME)/scripts
set pdk_stdcell_lib $pdk_platform_dir/lib/sg13g2_stdcell_typ_1p20V_25C.lib
set pdk_dont_use_cells {sg13g2_lgcp_1 sg13g2_sighold sg13g2_slgcp_1 sg13g2_dfrbp_2}
set pdk_latch_map $pdk_platform_dir/cells_latch.v
set pdk_tiehi_cell_port {sg13g2_tiehi L_HI}
set pdk_tielo_cell_port {sg13g2_tielo L_LO}
# Read app sources
read_verilog -defer -sv {*}$::env(SYN_SVERILOG_PATHS)
# Read stdcells
read_liberty -overwrite -setattr liberty_cell -lib $pdk_stdcell_lib
hierarchy -check -top $::env(TOP_MODULE)
# Synthesize, don't flatten
synth -run :fine
# Remove non-synthesizeable stuff
chformal -remove
delete t:\$print
# Optimize
opt -purge
# Technology mapping
techmap
techmap -map $pdk_latch_map
# Map D FFs to stdcell library
set dont_use_args ""
foreach cell $pdk_dont_use_cells {
lappend dont_use_args -dont_use $cell
}
dfflibmap -liberty $pdk_stdcell_lib {*}$dont_use_args
opt
# ABC optimization
abc -script $pdk_scripts_dir/abc_area.script -liberty $pdk_stdcell_lib {*}$dont_use_args
# Set undefined values to 0
setundef -zero
# Remove unused stuff
opt_clean -purge
# Technology mapping for constant 1/0
hilomap -singleton \
-hicell {*}$pdk_tiehi_cell_port \
-locell {*}$pdk_tielo_cell_port
# Write out design
json -o $::env(OBJDIR)/ihp.syn.json
# Reports
tee -o $::env(OBJDIR)/ihp.syn.check check
tee -o $::env(OBJDIR)/ihp.syn.stat stat -liberty $pdk_stdcell_lib
# Check that we mapped everything to std cells
check -assert -mapped
The comments in the script should provide some hints at what it is doing.
Finally, here’s our make synth
output for the ASIC part:
========================== IHP Summary ==========================
17. Printing statistics.
=== neosd ===
Number of wires: 1670
Number of wire bits: 2013
Number of public wires: 38
Number of public wire bits: 381
Number of ports: 19
Number of port bits: 122
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 1634
sg13g2_a21oi_1 78
sg13g2_a21oi_2 1
sg13g2_a221oi_1 11
sg13g2_a22oi_1 45
sg13g2_and2_1 2
sg13g2_and3_1 1
sg13g2_and4_1 2
sg13g2_buf_1 12
sg13g2_buf_2 35
sg13g2_buf_4 39
sg13g2_buf_8 19
sg13g2_dfrbp_1 285
sg13g2_inv_1 107
sg13g2_inv_2 10
sg13g2_inv_4 1
sg13g2_mux2_1 18
sg13g2_nand2_1 467
sg13g2_nand2_2 8
sg13g2_nand2b_1 8
sg13g2_nand3_1 48
sg13g2_nand3b_1 1
sg13g2_nand4_1 138
sg13g2_nor2_1 117
sg13g2_nor2_2 20
sg13g2_nor2b_1 61
sg13g2_nor2b_2 1
sg13g2_nor3_1 12
sg13g2_nor3_2 2
sg13g2_nor4_1 19
sg13g2_o21ai_1 62
sg13g2_tiehi 1
sg13g2_tielo 1
sg13g2_xnor2_1 1
sreg 1
Area for cell type \sreg is unknown!
Chip area for module '\neosd': 25358.167800
of which used for sequential elements: 13444.704000 (53.02%)
=== sreg ===
Number of wires: 65
Number of wire bits: 79
Number of public wires: 8
Number of public wire bits: 22
Number of ports: 8
Number of port bits: 22
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 57
sg13g2_dfrbp_1 8
sg13g2_inv_2 1
sg13g2_nand2_1 24
sg13g2_nand2b_1 8
sg13g2_nor2_2 8
sg13g2_nor2b_2 8
Chip area for module '\sreg': 820.108800
of which used for sequential elements: 377.395200 (46.02%)
=== design hierarchy ===
neosd 1
sreg 1
Number of wires: 1735
Number of wire bits: 2092
Number of public wires: 46
Number of public wire bits: 403
Number of ports: 27
Number of port bits: 144
Number of memories: 0
Number of memory bits: 0
Number of processes: 0
Number of cells: 1690
sg13g2_a21oi_1 78
sg13g2_a21oi_2 1
sg13g2_a221oi_1 11
sg13g2_a22oi_1 45
sg13g2_and2_1 2
sg13g2_and3_1 1
sg13g2_and4_1 2
sg13g2_buf_1 12
sg13g2_buf_2 35
sg13g2_buf_4 39
sg13g2_buf_8 19
sg13g2_dfrbp_1 293
sg13g2_inv_1 107
sg13g2_inv_2 11
sg13g2_inv_4 1
sg13g2_mux2_1 18
sg13g2_nand2_1 491
sg13g2_nand2_2 8
sg13g2_nand2b_1 16
sg13g2_nand3_1 48
sg13g2_nand3b_1 1
sg13g2_nand4_1 138
sg13g2_nor2_1 117
sg13g2_nor2_2 28
sg13g2_nor2b_1 61
sg13g2_nor2b_2 9
sg13g2_nor3_1 12
sg13g2_nor3_2 2
sg13g2_nor4_1 19
sg13g2_o21ai_1 62
sg13g2_tiehi 1
sg13g2_tielo 1
sg13g2_xnor2_1 1
Chip area for top module '\neosd': 26178.276600
of which used for sequential elements: 377.395200 (1.44%)
The most reliable information that can be obtained here is the number and type of gates. Chip area after synthesis does not consider any potential routing congestion or other placement related issues or overhead and is therefore only a rough estimate.
Conclusion
With the scripts offered here you can quickly evaluate the size of Verilog designs on FPGA and ASIC. Of course the obtained results are somewhat technology specific.
For the FPGA target, the main difference to other vendors’ FPGAs is the LUT size. The absolute number of LUTs used therefore obviously can’t be compared between different FPGA types. However, the resource count can be a useful relative metric to guide optimization of your designs. In addition, be careful if the synthesis uses special cells, which might be vendor specific (such as the ALU one).
For the ASIC target, results depend on the standard cell library and PDK you use. Of course area depends on the technology node, so gate count is more reliable. The amount of gates used will however also depend on what kind of gates are available in your standard cell library. Again, the obtained metrics are mainly useful to compare different iterations of a design in the same technology.
Be careful when you start using specific hard IP: Make sure to also specify it using black boxes. If you rely on inference for IP, chances are yosys will not map those to dedicated block but synthesize it to basic cells instead.