Kin icon

KIN DB 2004 Project - Stress Tests

Project hosted at:
SourceForge.net Logo

 

NOTE: This page is deprecated and for reference only. Refer to the new tests page for updated information and results.
These are some tests performed with routines as per November 17th, so you can have an idea of how CPU and RAM affect performance of some Kin DataBase main (massive) routines:

Most of the code for these test can be found into stress.c. If you want to contribute or discuss any of the figures here, you may post a comment at the project's performance forum. For an instance, would be interesting to see results on an Athlon64 with a fast RAM connected...

System 1

This is a Pentium 4 HT 2.60GHz with 512KB cache (cpu_family: 15, model: 2, stepping: 9), 2x512MB (DDR400) in dual bank configuration, on an Intel i865G chipset (using shared video frame buffer):


Stress version 0.0.2, Copyright (C) 2005 David Lopez Vinacua
GetuT() resolution: 1us, speed: 2 calls per us
There's 257019 pages (4096 bytes each):  980MB
CPU-RAM performance (best of 3):
 Page allocation speed: 69us ( 884MP/s)
Function speed on  256MB  Spent (us)   B/s   i/s
 bzero()                W     202722 1.23G 1.23G
 memset32_1()           W     202800 1.23G  308M
 memset32_2()           W     190233 1.31G  328M
 scasb_1() (membyte1)   R     430196  581M  581M
 scasb_2() (membyte2)   R     330320  756M  756M
 scasb_3() (memchr)     R     146233 1.70G 1.70G
 scasw_1() (memshort1)  R     221978 1.12G  563M
 scasw_2() (memshort2)  R     180872 1.38G  691M
 scasl_1() (memint1)    R     126658 1.97G  493M
 scasl_2() (memint2)    R     121539 2.05G  514M
 ChkSum32_1()           R     126262 1.98G  495M  Res=60 (OK)
In-Cache performance (tests are performed 8 times each, results are best of 4):
Function speed on  8x64KB Spent (us)   B/s   i/s
 scasb_1() (membyte1)   R        806  605M  605M
 scasb_2() (membyte2)   R        605  807M  807M
 scasb_3() (memchr)     R        202 2.41G 2.41G
 scasw_1() (memshort1)  R        403 1.21G  605M
 scasw_2() (memshort2)  R        303 1.61G  805M
 scasl_1() (memint1)    R        202 2.41G  604M
 scasl_2() (memint2)    R        151 3.23G  808M
 ChkSum32_1()           R        102 4.78G 1.19G
CPU benchmarks (best of 12):
 488K loop w/counter      Spent (us)   B/s   i/s
 loop instruction                384       1.21G
 memory counter                  384       1.21G
 register counter                288       1.61G
  

System 2

This is a Pentium 4 HT 3.20GHz with 1024KB cache (cpu_family: 15, model: 4, stepping: 1), 2x1GB (DDR400) in dual bank configuration, on an Intel i865G chipset (NOT using shared video frame buffer):


Stress version 0.0.2, Copyright (C) 2005 David Lopez Vinacua
GetuT() resolution: 1us, speed: 2 calls per us
There's 519151 pages (4096 bytes each): 1.98GB
CPU-RAM performance (best of 3):
 Page allocation speed: 49us (1.24GP/s)
Function speed on  256MB  Spent (us)   B/s   i/s
 bzero()                W     153610 1.62G 1.62G
 memset32_1()           W     154237 1.62G  405M
 memset32_2()           W     138509 1.80G  451M
 scasb_1() (membyte1)   R     342098  730M  730M
 scasb_2() (membyte2)   R     328722  760M  760M
 scasb_3() (memchr)     R     121426 2.05G 2.05G
 scasw_1() (memshort1)  R     174767 1.43G  715M
 scasw_2() (memshort2)  R     158085 1.58G  790M
 scasl_1() (memint1)    R      94547 2.64G  661M
 scasl_2() (memint2)    R      92812 2.69G  673M
 ChkSum32_1()           R      76785 3.25G  813M  Res=60 (OK)
In-Cache performance (tests are performed 8 times each, results are best of 4):
Function speed on  8x64KB Spent (us)   B/s   i/s
 scasb_1() (membyte1)   R        652  748M  748M
 scasb_2() (membyte2)   R        629  776M  776M
 scasb_3() (memchr)     R        214 2.28G 2.28G
 scasw_1() (memshort1)  R        326 1.49G  748M
 scasw_2() (memshort2)  R        291 1.67G  838M
 scasl_1() (memint1)    R        164 2.97G  744M
 scasl_2() (memint2)    R        166 2.94G  735M
 ChkSum32_1()           R        116 4.20G 1.05G
CPU benchmarks (best of 12):
 488K loop w/counter      Spent (us)   B/s   i/s
 loop instruction                311       1.49G
 memory counter                  777        599M
 register counter                234       1.99G
  

System 3

This is an Athlon XP 2000+ (1.66GHz) with 256KB cache (cpu_family: 6, model: 8, stepping: 1), 1x256MB (DDR333, but at 2x133MHz) on a VIA KM266 chipset (using shared video frame buffer):


Stress version 0.0.2, Copyright (C) 2005 David Lopez Vinacua
GetuT() resolution: 1us, speed: 5 calls per us
There's 60172 pages (4096 bytes each):  229MB
CPU-RAM performance (best of 3):
 Page allocation speed: 27us (1.13GP/s)
Function speed on  128MB  Spent (us)   B/s   i/s
 bzero()                W     279803  446M  446M
 memset32_1()           W     279829  446M  111M
 memset32_2()           W     338480  369M 94.5M
 scasb_1() (membyte1)   R     357724  349M  349M
 scasb_2() (membyte2)   R     332922  375M  375M
 scasb_3() (memchr)     R     283395  441M  441M
 scasw_1() (memshort1)  R     278807  448M  224M
 scasw_2() (memshort2)  R     282722  442M  221M
 scasl_1() (memint1)    R     263147  475M  118M
 scasl_2() (memint2)    R     263290  474M  118M
 ChkSum32_1()           R     271597  460M  115M  Res=60 (OK)
In-Cache performance (tests are performed 8 times each, results are best of 4):
Function speed on  8x64KB Spent (us)   B/s   i/s
 scasb_1() (membyte1)   R        708  689M  689M
 scasb_2() (membyte2)   R        708  689M  689M
 scasb_3() (memchr)     R        365 1.33G 1.33G
 scasw_1() (memshort1)  R        355 1.37G  687M
 scasw_2() (memshort2)  R        355 1.37G  687M
 scasl_1() (memint1)    R        179 2.72G  681M
 scasl_2() (memint2)    R        178 2.74G  685M
 ChkSum32_1()           R        179 2.72G  681M
CPU benchmarks (best of 12):
 488K loop w/counter      Spent (us)   B/s   i/s
 loop instruction                899        517M
 memory counter                  900        517M
 register counter                599        777M
  

System 4

This is an Athlon 800MHz with 256KB cache (cpufamily: 6, model: 4, stepping: 2), 384MB (SDR133), on a VIA KT133 chipset (shared video frame buffer, but not used):


Stress version 0.0.2, Copyright (C) 2005 David Lopez Vinacua
GetuT() resolution: 1us, speed: 2 calls per us
There's 96636 pages (4096 bytes each):  368MB
CPU-RAM performance (best of 3):
 Page allocation speed: 114us ( 535MP/s)
Function speed on  256MB  Spent (us)   B/s   i/s
 bzero()                W     716643  348M  348M
 memset32_1()           W     716739  348M 89.2M
 memset32_2()           W     716439  348M 89.3M
 scasb_1() (membyte1)   R    1510134  165M  165M
 scasb_2() (membyte2)   R    1449868  172M  172M
 scasb_3() (memchr)     R    1094242  228M  228M
 scasw_1() (memshort1)  R    1177657  212M  106M
 scasw_2() (memshort2)  R    1136320  220M  110M
 scasl_1() (memint1)    R    1009123  247M 63.4M
 scasl_2() (memint2)    R     928329  269M 68.9M
 ChkSum32_1()           R    1012167  246M 63.2M  Res=68 (FAILED!)
In-Cache performance (tests are performed 8 times each, results are best of 4):
Function speed on  8x64KB Spent (us)   B/s   i/s
 scasb_1() (membyte1)   R       1465  333M  333M
 scasb_2() (membyte2)   R       1464  333M  333M
 scasb_3() (memchr)     R        755  646M  646M
 scasw_1() (memshort1)  R        735  664M  332M
 scasw_2() (memshort2)  R        733  666M  333M
 scasl_1() (memint1)    R        370 1.31G  329M
 scasl_2() (memint2)    R        368 1.32G  331M
 ChkSum32_1()           R        371 1.31G  329M
CPU benchmarks (best of 12):
 488K loop w/counter      Spent (us)   B/s   i/s
 loop instruction               1859        250M
 memory counter                 1859        250M
 register counter               1239        375M
  

System 5

This is a Pentium 4 Celeron 2.40GHz with 128KB cache (cpu_family: 15, model: 2, stepping: 9), 256MB (DDR333), on an SiS 645 chipset:


Stress version 0.0.2, Copyright (C) 2005 David Lopez Vinacua
GetuT() resolution: 1us, speed: 3 calls per us
There's 64199 pages (4096 bytes each):  244MB
CPU-RAM performance (best of 3):
 Page allocation speed: 29us (1.5GP/s)
Function speed on  128MB  Spent (us)   B/s   i/s
 bzero()                W     172700  723M  723M
 memset32_1()           W     172645  724M  181M
 memset32_2()           W     159622  783M  195M
 scasb_1() (membyte1)   R     178878  698M  698M
 scasb_2() (membyte2)   R     137472  909M  909M
 scasb_3() (memchr)     R      75514 1.65G 1.65G
 scasw_1() (memshort1)  R      95480 1.30G  654M
 scasw_2() (memshort2)  R      78653 1.58G  794M
 scasl_1() (memint1)    R      72909 1.71G  428M
 scasl_2() (memint2)    R      73058 1.71G  427M
 ChkSum32_1()           R      72784 1.71G  429M  Res=60 (OK)
In-Cache performance (tests are performed 8 times each, results are best of 4):
Function speed on  8x64KB Spent (us)   B/s   i/s
 scasb_1() (membyte1)   R        665  734M  734M
 scasb_2() (membyte2)   R        503  970M  970M
 scasb_3() (memchr)     R        191 2.55G 2.55G
 scasw_1() (memshort1)  R        337 1.44G  724M
 scasw_2() (memshort2)  R        259 1.88G  942M
 scasl_1() (memint1)    R        189 2.58G  645M
 scasl_2() (memint2)    R        157 3.11G  777M
 ChkSum32_1()           R        124 3.93G  984M
CPU benchmarks (best of 12):
 488K loop w/counter      Spent (us)   B/s   i/s
 loop instruction                313       1.48G
 memory counter                 2323        200M
 register counter                234       1.99G
  

System 6

This is a Pentium4 2.40GHz with 512KB cache (cpufamily: 15, model: 2, stepping: 7), 256MB (DDR266), on a SiS645DX chipset:


Stress version 0.0.2, Copyright (C) 2005 David Lopez Vinacua
GetuT() resolution: 1us, speed: 2 calls per us
There's 64128 pages (4096 bytes each):  244MB
CPU-RAM performance (best of 3):
 Page allocation speed: 44us ( 693MP/s)
Function speed on  128MB  Spent (us)   B/s   i/s
 bzero()                W     252665  494M  494M
 memset32_1()           W     252408  495M  123M
 memset32_2()           W     223235  559M  139M
 scasb_1() (membyte1)   R     236125  529M  529M
 scasb_2() (membyte2)   R     181539  688M  688M
 scasb_3() (memchr)     R      90086 1.38G 1.38G
 scasw_1() (memshort1)  R     124273 1.00G  502M
 scasw_2() (memshort2)  R      98259 1.27G  636M
 scasl_1() (memint1)    R      86265 1.44G  362M
 scasl_2() (memint2)    R      86419 1.44G  361M
 ChkSum32_1()           R      86870 1.43G  359M  Res=60 (OK)
In-Cache performance (tests are performed 8 times each, results are best of 4):
Function speed on  8x64KB Spent (us)   B/s   i/s
 scasb_1() (membyte1)   R        884  552M  552M
 scasb_2() (membyte2)   R        667  732M  732M
 scasb_3() (memchr)     R        219 2.22G 2.22G
 scasw_1() (memshort1)  R        438 1.11G  557M
 scasw_2() (memshort2)  R        328 1.48G  744M
 scasl_1() (memint1)    R        219 2.22G  557M
 scasl_2() (memint2)    R        165 2.95G  739M
 ChkSum32_1()           R        110 4.43G 1.10G
CPU benchmarks (best of 12):
 488K loop w/counter      Spent (us)   B/s   i/s
 loop instruction                472        986M
 memory counter                  955        487M
 register counter                313       1.48G
  

System 7

This is a Pentium4 HT 3.00GHz with 1MB cache (cpufamily: 15, model: 3, stepping: 4), 512MB (DDR400), on an ATI RS300 chipset (with shared 128bit video framebuffer):


Stress version 0.0.2, Copyright (C) 2005 David Lopez Vinacua
GetuT() resolution: 1us, speed: 2 calls per us
There's 112687 pages (4096 bytes each):  429MB
CPU-RAM performance (best of 3):
 Page allocation speed: 74us ( 824MP/s)
Function speed on  256MB  Spent (us)   B/s   i/s
 bzero()                W     294850  847M  847M
 memset32_1()           W     293855  850M  212M
 memset32_2()           W     250606  997M  249M
 scasb_1() (membyte1)   R     376145  664M  664M
 scasb_2() (membyte2)   R     362579  689M  689M
 scasb_3() (memchr)     R     137923 1.81G 1.81G
 scasw_1() (memshort1)  R     194428 1.28G  642M
 scasw_2() (memshort2)  R     171886 1.45G  727M
 scasl_1() (memint1)    R     142041 1.76G  440M
 scasl_2() (memint2)    R     139814 1.78G  447M
 ChkSum32_1()           R     124947 2.00G  500M  Res=60 (OK)
In-Cache performance (tests are performed 8 times each, results are best of 4):
Function speed on  8x64KB Spent (us)   B/s   i/s
 scasb_1() (membyte1)   R        702  695M  695M
 scasb_2() (membyte2)   R        675  723M  723M
 scasb_3() (memchr)     R        231 2.11G 2.11G
 scasw_1() (memshort1)  R        351 1.39G  695M
 scasw_2() (memshort2)  R        315 1.55G  775M
 scasl_1() (memint1)    R        181 2.69G  674M
 scasl_2() (memint2)    R        178 2.74G  685M
 ChkSum32_1()           R        121 4.03G 1.00G
CPU benchmarks (best of 12):
 488K loop w/counter      Spent (us)   B/s   i/s
 loop instruction                334       1.39G
 memory counter                  835        557M
 register counter                251       1.85G
  

Note: For times longer than a few hundred microseconds, it is very usual to have task switching, so consecutive test may have diferent results. This is annoying on a study, but represents the real world (in a real system, there will be task switching events). If you preform a test, try to keep that machine as idle as posible to minimize spurious time accounting. Before every test, stress application tries to give the OS a chance to switch to other tasks, and results are usually stable (very small variations).
Note: System 4 fails checksum (result value is not the expected one). Will investigate that...

Some interesting conclusions

After examining those tests, some funny conclusions arise:

Return to home page from http://www.tuxgallery.org