Benchmark Test Report on VeritakSV3.83A-alpha and MXE6.5C

                                                                                      UpDated Oct.22.2010
                                                                                      Tak.Sugawara
1. Purpose

 To report performance comparison between  new version of VeritakSV and MXE6.5C.

2. Test Condition

Item Description Remarks
Machine Q9550(2.87GHz) DDR2 8GB memory
Core i7 920 DDR3 6GB memory
OS Vista Ultimate 32bit (Q9550)
Windows7 64bit (Core i7)
Test Bench Verilator's Site Bench/ FZ80/ Etc.
Simulator VeritakSV3.83A(0.39α)
/Veritak3.75(Basic/Pro) MXE6.5C(Not Starter. Full Xilinx Edition of Modelsim)
Measured Time From Simulation Starts to Simulation ends.
Not include compile time.
Veritak:Optimized Debug:Normal/ Level/2/NBA/Fast Switch


3 Test Result
Measure time between start and finish of the simulation.

Machine Q9550 Vista Ultimate 32bit-8GB DDR2

Category Bench Test Name Veritak-Pro(sec) VeritakSV(sec) MXE (sec) MXE/VeritakSV Veritak SV Remarks source/project
w/o waveform w/ waveform w/o
waveform
w/
waveform
w/o
waveform
w/
waveform
Project File State Save File source
Gate C6288(ISCAS'85) 14 0.38 0.83 57.44 151.96 151x 84x CycleBased c6288_load.vtakprj c6288_383.vtaksave bench_mark.zip
Altera Altera PLL 15 2.69 3.25 9.36 20.53 3.59x 6.3X
ddr 34 10.0 11.8
ddr3 78 28 37 ddr3_load.vtakprj ddr3_383.vtaksave
ddr2-avalon 20 6.18 7.0 ddr2_avalon_load.vtakprj ddr2_avalon_383.vtaksave
rapid-io 121 27.6 35.16 rapid_io_load.vtakprj rapid_io_383.vtaksave
pci-express - 19 25.5 pci_express_load.vtakprj pci_express_383.vtaksave
Xilinx DCM 69 12.8 17 21.12 39.34 1.65x 2.31x
RAM36 17 7 7.2 142 149.1 20.29x 20.71x
pci-xilinx 15 4.92 7.1
ddr2 -m 2 0.86 0.96
ddr2-mig33 ddr2_load.vtakprj ddr2_save_383.vtaksave
ddr3-11.1 204 91 101 ddr3_load.vtakprj ddr3_383.vtaksave
Small Design WB_Z80(opencores) 28 6.1 6.57 69.56 83.87 12.4x 12.8x
DIV(opencores) 136 63 70 556 682 8.83x 9.74x
USB1.1(opencores) 21 8.7 9.43 1013 120 11.61x 12.72x
m68k(opencores) 137 32 119 562 1147 17.56x 9.64x
ata(opnecores) 107 36.6 107 732 1161 20.0x 10.85x
tv80(opencores) 10 5.3 6.0 31.12 75.69 5.87x 12.62x
fz80 15 5.43 6.43 42.6 71.17 8.82x 11.07x CycleBased
AES(sugawara-systems.com) 7 2 2.84 18.22 67.61 9.12x 23.78x CycleBased
VGA(user contributed) 17 9 19 140 292 15.56x 15.37x
conmux(opencores) 14 4.75 5.15 82.75 148 17.43x 28.74x
AC97(opencores) 392 202.9 230 1868 2030 9.21x 8.83x
H8(sugawara-systems.com) 5 2.05 2.4
YACC-w/o cache(opencores) 65 81
YACC-w/ cache(sugawara-systems.com) 215 135 165
openrisc(opencores) 5 1.48 1.84 14.58 17.26 9.82x 17.26x
A0(sugawara-systems.com) 4 2.71   2.96 11.08 16.44 4.09x 5.55x
LatticeMico32 387 126 146 765 2018 6.07x 13.82x
fpu(opencores) 1636 432 750 3844 12950 8.90x 17.5x
Large Design PCI(opencores) 305 557 2482 5533 8.06x 9.93x
Ethernet(opencores) 322 476 Iterationlimit -
Basic Component base_test_bench_delay 9 6.04 25.11 4.15x bench_mark.zip
base_test_bench_nba_delay 10 7.27 54.78 7.54x
base_test_bench_prop 8 0.53 2.92 5.51x
base_test_bench_prop_delay 50 8.35 76 9.11x
base_test_bench_prop_nba 53 10.61 107 10.08x CycleBased base_test_bench_prop_nba_load.vtakprj base_test_bench_prop_nba_383.vtaksave
base_test_bench 3 0.43 1.36 3.16x base_test_bench_load.vtakpr base_test_bench_383.vtaksave
base_test_bench_inv 3 0.44 1.32 3.0x

4. Consideration
4.0 Performance 

Performance gain is at least 3x out of 95% benches, On average,10x performance is expected with compared to MXE., is said to 40%of PE.. (SE is 1-3x of PE).

4.1 Waveform Addition
 In general, high performance simulator degrades by adding to waveforms.,particularly in entire design. This is due to relatively heavier waveform operations than simulation engine itself.VeritakSV has a solution for the problem. Waveform operations are assigned by separated cpu power,so that performance degradation is minimized on Dual CPU platforms..
.
4.2 32bit limitation
In 32bit OS environment, address space is restricted in 2GB ( 2GB is for application other 2GB is for OS.) , disk operations can not be avoided when simulating long term or large design,even if you have more than 4GB RAM. Disk operation is bottleneck,. in fact some benches above show this kind degradation.

Category Bench Test Name VeritakSV(sec)
8GB-32bitVista
VeritakSV64(sec)
6GB-64bitWindows7
MXE (sec)
8GB-32bitVista
MXE/VeritakSV64
w/o waveform w/waveform w/o waveform w/ waveform w/o
waveform
w/
waveform
w/o
waveform
w/
waveform
Small ATA 36.6 107 32 40.7 731 1161 24x 29x
Small VGA 9 19 8.2 11.6 140 292 17x 24X
Small M68K 32 119 30 40.1 562 1147 19x 29x

4.3 64bit simulator on 64 bit OS

To overcome 4GB barrier 64bit OS is must. Using 64 bit OS and multi-threaded simulator with a lot of (inexpensive) DDRs, drastic performance gain can be expected. One problem is interchangeability for VPI/DPI/DPI-SC..on 32bit when 64bit simulator is used.. VeritakSV's engine runs 32bit on WOW64, while running waveform operations native 64bit. There is no need to worry about interchangeability.

4.4 Native Debugger
Since VeritakSV has NativeDebugger, debug-mode is non existent.. In full-speed running, you can place breakpoint on structural net assignments as well as procedural assignments ,with tool-tip in single step run.

4.5 Future enhancement
Cross compilation(64bit->32bit) will be considered to implement in future..

4.6 Download Simulator

 We attach restricted version of the simulator. This simulator does not compile any source, but can simulate the benches above by following procedure. Please note the simulator installation is possible on 64bit OS only. We will be glad to upload the objects that you would like to simulate if you supply sources.

Procedure:
  1. Download and install the simulator
  2. Download the Veritak Project file you like to perform benchmark
  3. Download StateSave file (*.vtaksave) and place the file to the same folder as 2.
  4. Load the project & Run.

VeritakWinSV64_383B Build Oct.22.20110

5.Conclusion