Benchmark Test Report on VeritakSV3.83A-alpha and MXE6.5C
UpDated
Oct.22.2010
Tak.Sugawara
1.
Purpose
To report performance comparison between new version of
VeritakSV and MXE6.5C.
2. Test Condition
| Item | Description | Remarks |
|---|---|---|
| Machine | Q9550(2.87GHz) DDR2 8GB memory Core i7 920 DDR3 6GB memory |
|
| OS | Vista Ultimate 32bit (Q9550) Windows7 64bit (Core i7) |
|
| Test Bench | Verilator's Site Bench/ FZ80/ Etc. | |
| Simulator | VeritakSV3.83A(0.39α) /Veritak3.75(Basic/Pro) MXE6.5C(Not Starter. Full Xilinx Edition of Modelsim) |
|
| Measured Time | From Simulation Starts to Simulation ends. Not include compile time. |
Veritak:Optimized Debug:Normal/ Level/2/NBA/Fast
Switch |
3 Test Result
Measure time between start and finish of the simulation.
Machine Q9550 Vista Ultimate 32bit-8GB DDR2
| Category | Bench Test Name | Veritak-Pro(sec) | VeritakSV(sec) | MXE (sec) | MXE/VeritakSV | Veritak SV Remarks | source/project | |||||
| w/o waveform | w/ waveform | w/o waveform |
w/ waveform |
w/o waveform |
w/ waveform |
Project File | State Save File | source | ||||
| Gate | C6288(ISCAS'85) | 14 | 0.38 | 0.83 | 57.44 | 151.96 | 151x | 84x | CycleBased | c6288_load.vtakprj | c6288_383.vtaksave | bench_mark.zip |
| Altera | Altera PLL | 15 | 2.69 | 3.25 | 9.36 | 20.53 | 3.59x | 6.3X | ||||
| ddr | 34 | 10.0 | 11.8 | |||||||||
| ddr3 | 78 | 28 | 37 | ddr3_load.vtakprj | ddr3_383.vtaksave | |||||||
| ddr2-avalon | 20 | 6.18 | 7.0 | ddr2_avalon_load.vtakprj | ddr2_avalon_383.vtaksave | |||||||
| rapid-io | 121 | 27.6 | 35.16 | rapid_io_load.vtakprj | rapid_io_383.vtaksave | |||||||
| pci-express | - | 19 | 25.5 | pci_express_load.vtakprj | pci_express_383.vtaksave | |||||||
| Xilinx | DCM | 69 | 12.8 | 17 | 21.12 | 39.34 | 1.65x | 2.31x | ||||
| RAM36 | 17 | 7 | 7.2 | 142 | 149.1 | 20.29x | 20.71x | |||||
| pci-xilinx | 15 | 4.92 | 7.1 | |||||||||
| ddr2 -m | 2 | 0.86 | 0.96 | |||||||||
| ddr2-mig33 | ddr2_load.vtakprj | ddr2_save_383.vtaksave | ||||||||||
| ddr3-11.1 | 204 | 91 | 101 | ddr3_load.vtakprj | ddr3_383.vtaksave | |||||||
| Small Design | WB_Z80(opencores) | 28 | 6.1 | 6.57 | 69.56 | 83.87 | 12.4x | 12.8x | ||||
| DIV(opencores) | 136 | 63 | 70 | 556 | 682 | 8.83x | 9.74x | |||||
| USB1.1(opencores) | 21 | 8.7 | 9.43 | 1013 | 120 | 11.61x | 12.72x | |||||
| m68k(opencores) | 137 | 32 | 119 | 562 | 1147 | 17.56x | 9.64x | |||||
| ata(opnecores) | 107 | 36.6 | 107 | 732 | 1161 | 20.0x | 10.85x | |||||
| tv80(opencores) | 10 | 5.3 | 6.0 | 31.12 | 75.69 | 5.87x | 12.62x | |||||
| fz80 | 15 | 5.43 | 6.43 | 42.6 | 71.17 | 8.82x | 11.07x | CycleBased | ||||
| AES(sugawara-systems.com) | 7 | 2 | 2.84 | 18.22 | 67.61 | 9.12x | 23.78x | CycleBased | ||||
| VGA(user contributed) | 17 | 9 | 19 | 140 | 292 | 15.56x | 15.37x | |||||
| conmux(opencores) | 14 | 4.75 | 5.15 | 82.75 | 148 | 17.43x | 28.74x | |||||
| AC97(opencores) | 392 | 202.9 | 230 | 1868 | 2030 | 9.21x | 8.83x | |||||
| H8(sugawara-systems.com) | 5 | 2.05 | 2.4 | |||||||||
| YACC-w/o cache(opencores) | 65 | 81 | ||||||||||
| YACC-w/ cache(sugawara-systems.com) | 215 | 135 | 165 | |||||||||
| openrisc(opencores) | 5 | 1.48 | 1.84 | 14.58 | 17.26 | 9.82x | 17.26x | |||||
| A0(sugawara-systems.com) | 4 | 2.71 | 2.96 | 11.08 | 16.44 | 4.09x | 5.55x | |||||
| LatticeMico32 | 387 | 126 | 146 | 765 | 2018 | 6.07x | 13.82x | |||||
| fpu(opencores) | 1636 | 432 | 750 | 3844 | 12950 | 8.90x | 17.5x | |||||
| Large Design | PCI(opencores) | 305 | 557 | 2482 | 5533 | 8.06x | 9.93x | |||||
| Ethernet(opencores) | 322 | 476 | Iterationlimit | - | ||||||||
| Basic Component | base_test_bench_delay | 9 | 6.04 | 25.11 | 4.15x | bench_mark.zip | ||||||
| base_test_bench_nba_delay | 10 | 7.27 | 54.78 | 7.54x | ||||||||
| base_test_bench_prop | 8 | 0.53 | 2.92 | 5.51x | ||||||||
| base_test_bench_prop_delay | 50 | 8.35 | 76 | 9.11x | ||||||||
| base_test_bench_prop_nba | 53 | 10.61 | 107 | 10.08x | CycleBased | base_test_bench_prop_nba_load.vtakprj | base_test_bench_prop_nba_383.vtaksave | |||||
| base_test_bench | 3 | 0.43 | 1.36 | 3.16x | base_test_bench_load.vtakpr | base_test_bench_383.vtaksave | ||||||
| base_test_bench_inv | 3 | 0.44 | 1.32 | 3.0x | ||||||||
4.
Consideration
4.0 Performance
Performance gain is at least 3x out of 95% benches, On average,10x performance
is expected with compared to MXE., is said to 40%of PE.. (SE is 1-3x of
PE).
4.1 Waveform Addition
In general, high performance simulator degrades by adding to waveforms.,particularly
in entire design. This is due to relatively heavier waveform operations
than simulation engine itself.VeritakSV has a solution for the problem.
Waveform operations are assigned by separated cpu power,so that performance
degradation is minimized on Dual CPU platforms..
.
4.2 32bit limitation
In 32bit OS environment, address space is restricted in 2GB ( 2GB is for
application other 2GB is for OS.) , disk operations can not be avoided
when simulating long term or large design,even if you have more than 4GB
RAM. Disk operation is bottleneck,. in fact some benches above show this
kind degradation.
| Category | Bench Test Name | VeritakSV(sec) 8GB-32bitVista |
VeritakSV64(sec) 6GB-64bitWindows7 |
MXE (sec) 8GB-32bitVista |
MXE/VeritakSV64 | ||||
| w/o waveform | w/waveform | w/o waveform | w/ waveform | w/o waveform |
w/ waveform |
w/o waveform |
w/ waveform |
||
| Small | ATA | 36.6 | 107 | 32 | 40.7 | 731 | 1161 | 24x | 29x |
| Small | VGA | 9 | 19 | 8.2 | 11.6 | 140 | 292 | 17x | 24X |
| Small | M68K | 32 | 119 | 30 | 40.1 | 562 | 1147 | 19x | 29x |
4.3 64bit simulator on 64 bit OS
To overcome 4GB barrier 64bit OS is must. Using 64 bit OS and multi-threaded
simulator with a lot of (inexpensive) DDRs, drastic performance gain can
be expected. One problem is interchangeability for VPI/DPI/DPI-SC..on 32bit
when 64bit simulator is used.. VeritakSV's engine runs 32bit on WOW64,
while running waveform operations native 64bit. There is no need to worry
about interchangeability.
4.4 Native Debugger
Since VeritakSV has NativeDebugger, debug-mode is non existent.. In full-speed
running, you can place breakpoint on structural net assignments as well
as procedural assignments ,with tool-tip in single step run.
4.5 Future enhancement
Cross compilation(64bit->32bit) will be considered to implement in future..
4.6 Download Simulator
We attach restricted version of the simulator. This simulator does not
compile any source, but can simulate the benches above by following procedure.
Please note the simulator installation is possible on 64bit OS only. We
will be glad to upload the objects that you would like to simulate if you
supply sources.
VeritakWinSV64_383B Build Oct.22.20110
5.Conclusion
