From: S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa) Newsgroups: comp.sys.ibm.pc.hardware,comp.sys.intel Subject: Performance comparison Intel 386DX, Cyrix 486DLC, C&T 38600DX, Intel R Message-ID: <1j69c9INN8qg@iraul1.ira.uka.de> Date: 15 Jan 1993 12:06:33 GMT Organization: University of Karlsruhe, FRG Lines: 511 Performance Comparison Intel 386DX, Intel RapidCAD, C&T 38600DX, Cyrix 486DLC This document, containing descriptions and a benchmark comparison of the Intel 386DX, Intel RapidCAD, C&T 38600DX, and the Cyrix 486DLC has been put together for the benefit of the net.community. I believe, but cannot guarantee, that all data given herein is correct. If you would like to report errors in this document, or have suggestions for its enhancement, feel free to contact me at the following email address: S_JUFFA@IRAVCL.IRA.UKA.DE You can also write to me at my snail mail address: Norbert Juffa Wielandstr. 14 7500 Karlsruhe 1 Germany Corrections pertaining to spelling or grammatical errors are also encouraged, especially from those net.users that are native speakers of (American) English. This is the second version of this document. Thanks to the people that helped improve it by their comments on the previous version: Alex Martelli (martelli@cadlab.sublink.org), Fred Dunlap (cyrix!fred@texsun.Central.Sun.COM) A special thanks for editing this article goes to: David Ruggiero (osiris@halcyon.halcyon.com) --------------------------------------------------------------------------- In 1992, three new CPUs were introduced into the PC marketplace. Each has the capability to replace the standard Intel 386DX CPU and provide higher performance for existing 386 systems. These new chips are the Intel RapidCAD, the Chips&Technologies (C&T) Super386 38600DX, and the Cyrix 486DLC. At the moment, all of these are only available in 33 MHz versions, but C&T and Cyrix have announced 40 MHz versions for delivery in early 1993. (Of course you could try to push existing 33 MHz chips to 40 MHz, but I do not recommend this, as my experience shows that operation at the higher frequency tends to be unreliable, with the possibility of the machine locking up unexpectedly.) While the Intel RapidCAD is marketed only as an end-user product (which means PC manufacturers will not ship systems with the RapidCAD already installed), both C&T and Cyrix sell their products directly to the end user as well as to PC motherboard manufacturers. Cyrix also manufactures the 486SLC, which is a replacement for the Intel/AMD 386SX CPUs. C&T has announced a 38600SX chip, but it will not ship before sometime in 1993, if at all. The Intel RapidCAD ------------------ The Intel RapidCAD is, in rough terms, an Intel 486DX with the 8 kB on-chip cache removed and a 386 pinout. To software, it appears to be a 386DX with a math coprocessor, as the 486-specific instructions have been removed from its instruction set. It is marketed by Intel as "the ultimate coprocessor" and, as the name implies, its main purpose is to replace the 386DX in existing systems and thereby boost the performance of floating-point intensive applications such as CAD software, spreadsheets, and math packages (e.g. SPSS, Mathematica). RapidCAD is delivered as a set of two chips. RapidCAD-1 is a 132-pin PGA (pin grid array) chip that goes into the CPU socket and replaces the 386DX. It contains the CPU and the FPU (floating-point unit). RapidCAD-2 is a 68-pin PGA chip that goes into the 387 coprocessor (or EMC) socket; it contains a simple PLA whose only purpose it to generate the FERR signal for the motherboard logic: this provides full PC-compatible handling of floating point exceptions. Many CPU instructions execute in one clock cycle in the RapidCAD, just as on the Intel 486. The RapidCAD is constrained, however, by use of the standard 386DX bus interface, since every bus cycle takes two CPU clock cycles. This means that instructions may be executed by the RapidCAD faster than they can be fetched from memory. But as most FPU instructions take longer to execute than CPU instructions, they are not greatly slowed down by this slow bus interface, and most of them execute in the same time as in the Intel 486DX. Therefore, the Intel RapidCAD provides higher overall floating-point performance than any existing 386/387 combination. Results from the SPEC benchmarks, a common workstation UNIX benchmark suite, show that the RapidCAD provides 85% more floating point and 15% more integer performance than an Intel 386DX/387DX combination at the same clock frequency. Power consumption of the RapidCAD is typically 3500 mW at 33 MHz. The current street price of the 33 MHz version is ~300 US$. The C&T 38600DX --------------- The Chips&Technologies 38600DX is designed to be 100% compatible to the Intel 386DX CPU. Unlike AMD's Am386 CPUs (which use microcode that is identical to the Intel 386DX's microcode), the C&T 38600DX uses new microcode developed within C&T using "clean-room" techniques. C&T even included the 386DX's "undocumented" LOADALL386 instruction in the instruction set to provide full compatibility with the Intel chip. Some instructions execute faster on the 38600DX than on the 386DX. C&T also produces a 38605DX CPU that includes a 512-byte instruction cache and provides further performance increases over the Intel part. The 38605DX needs a bigger socket (144-pin PGA) and is therefore *not* pin-compatible with the 386DX. In my tests, I found that the 38600DX has severe problems with the CPU- coprocessor communication, which causes its floating-point performance to drop below the level provided by the Intel 386DX/Intel 387DX for most programs. This problem exists with all available 387 compatible coprocessors (ULSI 83C87, IIT 3C87, Cyrix EMC87, Cyrix 83D87, Cyrix 387+, C&T 38700, and the Intel 387DX). (A net.acquaintance also did tests with the 38600DX and arrived at similar results. He contacted C&T and they said they were aware of the problem.) Typical power consumption of the 40 MHz 38600DX is 1650 mW, which is less than the typical power consumption of the Intel 386DX at 33 MHz (2000 mW). The 38600DX currently sells for ~US$ 80 for the 33 MHz version. The Cyrix 486DLC ---------------- The Cyrix 486DLC is the latest entry into the market of 386DX replacements. It features a 486SX compatible instruction set, a 1 kB on-chip cache, and a 16x16- bit hardware multiplier. The RISC-like execution unit of the 486DLC executes many instructions in a single clock cycle. The hardware multiplier multiplies 16-bit quantities in 3 clock cycles, as compared to 12-25 cycles on a standard Intel 386DX. This is especially useful in address calculations (code from non- optimizing compilers may contain many MUL instructions for array accesses) and for software floating-point arithmetic (e.g., a coprocessor emulator). The 1 kB cache helps the 486DLC to overcome the limitations of the 386 bus interface, although its hit rate averages only about 65% under normal program conditions. In existing 386 systems, DMA transfers (such as those performed by a SCSI controller or a sound board) may force the internal cache of the 486DLC to be flushed as the only means available in a 386 system to enforce consistency between the contents of the on-chip cache and external memory. This stems from the fact that the 386 bus interface was designed without provisions for an on-chip cache. This problem can reduce the performance of the 486DLC in systems that do a sizeable amount of (bus master) DMA. Cyrix has, however, defined some additional cache control signals for some of the 486DLC lines; they can be used to improve communications between the on-chip cache and a larger external cache or main memory and prevent this problem. Although current 386 systems ignore these signals (since they are not defined for the Intel 386DX), future systems designed with the Cyrix chip in mind may take advantage of them and thereby gain increased performance. The Cyrix cache is a unified data/instruction write-through type and can be configured as either a direct-mapped or 2-way set associative cache. It permits definition of up to four non-cacheable regions, which are particularly useful if a system has memory mapped peripherals (e.g., the Weitek math coprocessor). For compatibility reasons, the cache is disabled after a processor reset and must be enabled with the help of a small program provided by Cyrix. This prevents problems with BIOSes that are not "486DLC aware". (I am certain that future versions of the AMI BIOS and other BIOSes will take the 486DLC into consideration and directly support the Cyrix chip's cache.) The 486DLC will not work correctly with all math coprocessors in all circumstances, with protected mode multitasked environments (e.g. MS-Windows 386-enhanced mode) being especially critical. Using the 486DLC with the Cyrix EMC87, Cyrix 83D87 (chips produced prior to November 1991), and IIT 3C87, I have been able to completely lock up the machine due to synchronization problems between the CPU and the coprocessor while executing the FSAVE or FRSTOR instructions (which are used to save and restore the coprocessor status during task switches). According to Cyrix, this problem only occurs with the first revision of the 486DLC and is fixed on newer ones. To be on the safe side, the 486DLC should best be used with the Cyrix 387+ (its "Europe-only" name) or with the identical Cyrix 83D87 (US-bound chips manufactured after October 1991): these are not only are the highest performing 387 coprocessors on the market, but they also work properly even with the first generation 486DLC. If you already have a Cyrix 83D87 coprocessor and want to know whether it is the old or new type, I recommend you use my COMPTEST program, available as CTEST257.ZIP via anonymous ftp from garbo@uwasa.fi and other fine ftp servers. If COMPTEST reports a 387+, you either have the 387+ or the identical newer version of the 83D87 installed and can use any version of the Cyrix 486DLC without problems. If you believe that you may have problems with a 486DLC/387 combination, I suggest you contact Cyrix technical support (1-800-FAS-MATH in the US). Power consumption of a 40 MHz 486DLC is typically 2800 mW. The 486DLC sells for ~115 US$ for the 33 MHz version, according to its German distributor. Tests and benchmarks -------------------- HW configuration: 33.3/40 MHz motherboard with Forex chip set and AMI BIOS. 128 kB zero- wait-state, direct mapped, write-through CPU cache with one write buffer, 4 bytes per cache line, and 4 clock cycles penalty for a cache line miss. 8 MB of main memory with an average access time of 1.6 wait states. Cyrix EMC87 in 387 compatibility mode as math coprocessor. (This and the Cyrix 83D87 / 387+ are the fastest coprocessors available for use with the 386DX/486DLC/38600DX). Conner 3204F hard disk, 203 MB capacity, IDE interface (CORETEST throughput 1100 kB/s, seek time 16 ms). Diamond SpeedSTAR HiColor, ISA bus SVGA card using Tseng's ET4000 chip, 1 MB DRAM as frame buffer, *no* accelerator. The switches on the card were set for fastest reliable operation, with a throughput of 6500 bytes/ms at 40 MHz and 5400 bytes/ms at 33.3 MHz. SW configuration: MS-DOS 5.0, MS Windows 3.1, HyperDisk 4.32 disk cache program in write-back mode, using 2 MB of extended memory, 386MAX 6.01 used as memory manager and DPMI provider in some benchmarks. Latest Tseng (Colorview) driver for Windows 3.1 at 1024x768x256, using the 8514 fonts. For the Whetstone, Dhrystone, WINTACH, DODUC, LINPACK, LLL, and Savage benchmarks, *higher* numbers indicate *faster* performance. For the MAKE RTL, MAKE TRANCK, and String-Test benchmarks, *lower* numbers indicate *faster* performance. Intel C&T Intel Cyrix Cyrix 33.3 MHz 386DX 38600DX RapidCAD 486DLC 486DLC cache off cache on integer Whetstone [kWhets/s] 447 585 563 695 803 Dhrystone (C) [Dhry./s] 11688 11819 12357 14150 15488 Dhrystone (Pas) [Dhry./s]10455 10877 10751 12154 13858 String-Test [ms] 459 453 441 347 327 MAKE RTL [s] 51.32 47.10 46.34 43.45 39.13 MAKE TRANCK [s] 62.42 55.47 55.37 53.64 46.12 WINTACH [overall RPM] 4.85 4.90 5.49 5.53 6.14 float DODUC [Rapidity index] 79.0 76.4 150.3 89.4 90.7 LINPACK [MFLOPS] 0.2808 0.2707 0.4578 0.3158 0.3438 LLL [MFLOPS] 0.3352 0.3537 0.6083 0.3816 0.4139 Whetstone [kWhets/s] 2540 2340 3990 2908 3061 Savage [function eval/s] 71685 53191 72464 88757 93897 Intel C&T Intel Cyrix Cyrix 40.0 MHz 386DX 38600DX RapidCAD 486DLC 486DLC cache off cache on integer Whetstone [kWhets/s] 536 702 676 835 963 Dhrystone (C) [Dhry./s] 14128 14116 14836 16987 18750 Dhrystone (Pas) [Dhry./s]12490 13067 12890 14573 16624 String-Test [ms] 384 377 368 289 273 MAKE RTL [s] 43.46 40.11 39.84 37.25 33.54 MAKE TRANCK [s] 53.00 47.59 47.07 45.36 39.00 WINTACH [overall RPM] 5.65 5.73 6.41 6.46 7.23 float DODUC [Rapidity index] 94.9 77.5 180.3 105.1 106.6 LINPACK [MFLOPS] 0.3324 0.3260 0.5418 0.3789 0.4131 LLL [MFLOPS] 0.4025 0.4204 0.7263 0.4562 0.4956 Whetstone [kWhets/s] 3061 2632 4798 3505 3677 Savage [function eval/s] 86083 49587 86957 106762 112360 To complete the picture, I ran the CPU/FPU standard benchmarks on an Intel 486DX running at 33.3/40 MHz. Since the 486 machine was configured with a different hard disk than the 386 system, and the compilers and tools installed on the 386 machine were not present, the MAKE benchmarks could unfortunately not be included in the tests. 486DX, 256 kBytes CPU cache (write-thru), 8 MB of RAM, AMI-BIOS, Diamond SpeedSTAR HiColor (VIDSPEED thoughput: 6500 bytes/ms at 40 MHz and 5400 bytes/ms at 33.3 MHz), MS-DOS 5.0, MS-Windows 3.1: integer 33.3 MHz 40 Mhz Whetstone [kWhets/s] 707 848 Dhrystone (C) [Dhry./s] 19394 23265 Dhrystone (Pas) [Dhry./s] 16978 20368 String-Test [ms] 333 279 WINTACH 8.59 10.14 float DODUC [Rapidity index] 184.0 220.7 LINPACK [MFLOPS] 0.6682 0.8204 LLL [MFLOPS] 0.9387 1.1110 Whetstone [kWhets/s] 5143 6195 Savage [function eval/s] 82192 98522 Conclusions ----------- The Cyrix 486DLC is the 386DX replacement with the highest integer performance. With the internal cache enabled, integer performance of the 486DLC can be up to 80% higher compared with a Intel 386DX at the same clock frequency, with the average speed gain for integer applications being 35%. Enabling the internal cache provides about 5-15% more performance than with the cache disabled for both integer and floating-point applications. Floating-point applications are accelerated by about 15%-30% if the Cyrix 486DLC (with cache enabled) is used instead of the Intel 386DX. Compared with the Intel 486DX, the Cyrix 486DLC provides about 70% of the integer performance and about 50% of the floating point performance at the same clock frequency. The Intel RapidCAD is the 386DX replacement that provides the highest floating point performance. It can speed up most floating-point intensive programs by 60%-90% compared with the fastest Intel 386DX/math coprocessor combination; it provides nearly 75% of the floating-point performance of a Intel 486DX at the same clock frequency. Integer performance increases by an average 15% by using the RapidCAD instead of the standard Intel 386DX, with the maximum performance gain being 35%. The Chips&Technologies 38600DX has a slightly higher integer performance than the Intel 386DX, with the speedup ranging from 0%-30% and an average speedup on the order of 10%. Description of benchmarks ------------------------- DHRYSTONE [9] is a synthetic benchmark developed by R. Weicker from Siemens in 1984. The frequency of operations and data types used by Dhrystone are modeled after statistics collected for 'typical' programs that are written in a HLL (high level language) such as C or Pascal that do not use floating-point arithmetic. Thus there is a certain distribution of global and local variables being used in procedures, there is a certain percentage of use of records (structs) or strings out of the total number of variables accessed and there is a certain percentage of procedure calls and if-statements out of the total lines of codes executed. All these percentages match the statistics used in the development of Dhrystone quite closely. To preserve the typical distributions, the measurement rules for Dhrystone forbid function inlining (a frequently used optimization technique optionally performed by most optimizing compilers). The current version of Dhrystone is 2.1, and this is the version used for my tests. Version 2.1 [10] differs from previous versions in that it contains additional code that prevents optimizing compilers from throwing out most of the code by dead code elimination (since Dhrystone has no input file, many expressions can be computed at compile time), thus artificially inflating performance numbers. The Dhrystone benchmark exists in equivalent Ada, Pascal and C versions, which are all available from netlib@ornl.gov. Dhrystone measures the time to execute its main loop and sets this time into relation with a fixed reference time, the result being a performance index given in "Dhrystones per second". I used two Dhrystone executables. One was compiled using the non-optimizing Turbo Pascal 6.0 compiler, the other one was compiled with the MS-C 7.0 compiler using the large memory model and maximum optimization but without the use of function inlining. Because the Turbo Pascal executable uses more memory operands and the MS-C executable uses more register-to-register operations, the speedup for the executables on different CPUs can be noticeably different. WINTACH is a public domain benchmark program by Texas Instruments that measures the speed of graphics output for four typical MS-Windows applications: a word processor, a spreadsheet, a CAD program, and a paint program. TI has collected profiles for each class of applications under 'typical' user loads. The four parts of the WINTACH benchmark were then modeled after these profiles, so that the percentage of certain GDI (graphics device interface) calls found in the profiles is also present in the benchmark. WINTACH determines the performance of each program part relative to the performance of a standard VGA card in a 20 MHz 386DX machine. It also combines the four values into an overall relative performance index, which is reported in my tables. Using this single number to represent graphics performance is justified in this case since the four partial results do *not* deviate much from the average for the SVGA tested (a Diamond SpeedSTAR HiColor). I used a frame buffer card for the test because for these graphics card type, graphics performance depends solely on the performance of the CPU, which handles all accesses to the display memory. On an accelerated card, some of the low-level operations are delegated to the accelerator chip and the influence of the CPU speed on the graphics performance is not seen so clearly. For this test, I used the latest Tseng Windows 3.1 driver (Colorview) at a resolution of 1024x768x256 with the large 8514 fonts (WINTACH code C8). MAKE RTL is not a single program. Rather it is a build of a complete project, my own run-time library for Turbo Pascal 6.0. The project consists of about 200 source files with a total size of ~650 kByte. Most of the source files are assembly language (.ASM) files, while there are also some Pascal (.PAS) files. Building the complete project using MAKE, about 200 binary files (.OBJ and .TPU) are produced with a combined size of ~300 kB. The programs used during the build are MAKE 3.5, TPC 6.01, and TASM 2.01, all by Borland, Inc. All files are read from and written to the hard disk, using a 2 MB write back disk cache provided by the HyperDisk 4.32 program. This eliminates most of the I/O overhead. No memory manager was installed during the test. The time reported is the time from starting the MAKE utility to the reappearance of the DOS prompt. MAKE RTL can be thought of being typical of applications that operate on a lot of small files, and use only integer instructions. MAKE TRANCK is a project build for a project consisting of two assembly language modules, two C modules and one FORTRAN module. The combined size of the source files is approximately 120 kBytes. All modules are compiled (assembled) and combined into a single program linking with both, the C and the FORTRAN libraries. Programs used are MAKE 3.5 by Borland and MASM 6.0, MS-C 7.0, MS-FORTRAN 5.0, and LINK 5.3, all by Microsoft, Inc. The C compiler and the linker run in protected mode and require a DPMI host to be present. 386MAX 6.01 by Qualitas was installed for this purpose. To minimize I/O overhead, the HyperDisk 4.32 disk caching program was installed, with 2 MByte of extended memory allocated to the cache. The modules are compiled with compiler switches set for maximum optimization, and most of the time for the project build is spent in the optimizing stages of the compilers. STRING-TEST is a simple benchmark that tests the string handling functions built into the Turbo Pascal language. Nearly all of the execution time is spent in repeated execution of the STOS, CMPS, SCAS, and MOVS instructions. The data and code fit into about five kBytes of memory. In cached systems, almost all memory accesses will be cache hits. The program was written and compiled with Turbo Pascal 6.0 and linked with my own run-time library, which provides much faster string functions than the run-time library delivered by Borland. Compiler switches were set for fastest execution. The time reported is the time to complete the whole benchmark in milliseconds, with an accuracy of +/-1 millisecond. LLL is short for Lawrence Livermore Loops [8], a set of kernels taken from real floating point extensive programs. Some of these loops are vectorizable, but since we don't deal with vector processors here, this doesn't matter. For this test, LLL was adapted from the FORTRAN original in [7] to Turbo Pascal. By variable overlaying (similar to FORTRAN's EQUIVALENCE statement) memory allocation for data was reduced to 64 kB, so all data fits into a single 64 kB segment. The older version of LLL is used here which contains 14 loops. There also exists a newer, more elaborate version consisting of 24 kernels. The kernels in LLL exercise only multiplication and addition. The MFLOPS rate reported is the average of the MFLOPS rate of all 14 kernels as reported by the LLL program. LLL and Whetstone results (see below) are reported as returned by my COMPTEST test program in which they have been included as a measure of coprocessor/FPU performance. COMPTEST has been compiled under Turbo Pascal 6.0 with all 'optimizations' on and using my own run-time library, which gives higher performance than the one included with TP 6.0. My library is available as TPL60N17.ZIP from garbo.uwasa.fi and ftp-sites that mirror this site. All floating point variables in the program where declared as DOUBLE. LINPACK [4] is a well known floating-point benchmark that also heavily exercises the memory system. Linpack operates on large matrices and takes up about 570 kB in the version used for this test. This is about the largest program size a pure DOS system can accommodate. Linpack was originally designed to estimate performance of BLAS, a library of FORTRAN subroutines that handles various vector and matrix operations. Note that vendors are free to supply optimized (e.g. assembly language versions) of BLAS. Linpack uses two routines from BLAS which are thought to be typical of the matrix operations used by BLAS. Both routines only use addition/subtraction and multiplication. The FORTRAN source code for Linpack can be obtained from the automated mail server netlib@ornl.gov. Linpack was compiled using MS FORTRAN 5.0 in the HUGE memory model (which can handle data structures larger than 64 kB) and with compiler switches set for maximum optimization. Linpack performs the same test repeatedly. The number reported is the maximum MFLOPS rate returned by Linpack. All floating point variables in the program were declared as DOUBLE. Linpack MFLOPS ratings for a great number of machines are contained in [5]. This PostScript document is also available from netlib@ornl.gov. WHETSTONE [1,2,3] is a synthetic benchmark based on statistics collected about the use of certain control and data structures in programs written in high level languages. Based on these statistics, Whetstone tries to mirror a 'typical' HLL program. Whetstone performance is expressed by how many theoretical 'whetstone' instructions are executed per second. It was originally implemented in ALGOL. Unlike LLL and Linpack, Whetstone not only uses addition and multiplication but exercises all basic arithmetic operations as well as some transcendental functions. Whetstone performance depends on the speed of the coprocessor as well as on the speed of the CPU, while LLL and Linpack place a heavier burden on the coprocessor/FPU. There exist an old and a new version of Whetstone. Note that results from the two versions can differ by as much as 20% for the same test configuration. For this test, the new version in Pascal from [2] was used. It was compiled with Turbo Pascal 6.0 and my own library (see above) with all 'optimizations' on. For the integer test, software fp arithmetic using the REAL type was utilized. Using the software arithmetic exercises only the CPU, in particular the execution of MOV, SHL, SHR, RCR, ADD, SUB, MUL, and DIV instructions. For the floating point test, the hardware floating-point arithmetic of the coprocessor was used and computations were performed using the DOUBLE type. SAVAGE tests the performance of transcendental function evaluation. It is basically a small loop in which the sin, cos, arctan, ln, exp, and sqrt functions are combined in a single expression. While sin, cos, arctan, and sqrt can be evaluated directly with a single 387 coprocessor instruction each, ln and exp need additional preprocessing for argument reduction and result conversion. According to [11], the Savage benchmark was devised by Bill Savage, and is distributed by: The Wohl Engine Company, Ltd., 8200 Shore Front Parkway, Rockaway Beach, NY 11693, USA. Usually, Savage is programmed to make 250,000 passes though the loop. Here only 10,000 loops are executed for a total of 60,000 transcendental function evaluations. The result is expressed in function evaluations per second. SAVAGE source code was taken from [6] and compiled with Turbo Pascal 6.0 and my own run-time library (see above). DODUC [12] is a modified application program by Nhuan Doduc that is also part of the SPEC benchmark suite. It is a nuclear safety analysis code that simulates the time evolution of a thermohydraulic modelization for a nuclear reactor's component and the benchmark is created from the computational kernel of the original application. The benchmark consist of 5323 FORTRAN statement lines and the executable size is about 350 kBytes. Almost all processing is done using double precision floating-point values. There is not much array processing. Instead the program has an iterative structure with an abundance of short branches and small loops. The version used for this test was compiled with the highly optimizing NDP FORTRAN V3.0 compiler from Microway, which generates a 32-bit mode executable that runs in protected mode using a protected mode loader provided by Microway. The execution time of the DODUC program is set into relation with fixed reference times and the result is scaled to give a "Rapidity index". References ---------- [1] Curnow, H.J.; Wichmann, B.A.: A synthetic benchmark. Computer Journal, Vol. 19, No. 1, 1976, pp. 43-49 [2] Wichmann, B.A.: Validation code for the Whetstone benchmark. NPL Report DITC 107/88, National Physics Laboratory, UK, March 1988 [3] Curnow, H.J.: Wither Whetstone? The Synthetic Benchmark after 15 Years. In: Aad van der Steen (ed.): Evaluating Supercomputers. London: Chapman and Hall 1990 [4] Dongarra, J.J.: The Linpack Benchmark: An Explanation. In: Aad van der Steen (ed.): Evaluating Supercomputers. London: Chapman and Hall 1990 [5] Dongarra, J.J.: Performance of Various Computers Using Standard Linear Equations Software. Report CS-89-85, Computer Science Department, University of Tennessee, March 11, 1992 [6] Huth, N.: Dichtung und Wahrheit oder Datenblatt und Test. Design & Elektronik 1990, Heft 13, Seiten 105-110 [7] Esser, R.; Kremer, F.; Schmidt, W.G.: Testrechnungen auf der IBM 3090E mit Vektoreinrichtung. Arbeitsbericht RRZK-8803, Regionales Rechenzentrum an der Universit"at zu K"oln, Februar 1988 [8] McMahon, H.H.: The Livermore Fortran Kernels: A test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, USA, December 1986 [9] Weicker, R.P.: Dhrystone: A Synthetic Systems Programming Benchmark. Communications of the ACM, Vol. 27, No. 10, October 1984, pp. 1013-1030 [10] Weicker, R.P.: Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules. SIGPLAN Notices. Vol. 23, No. 8, August 1988, pp. 49-62 [11] FasMath 83D87 Benchmark Report. Cyrix Corporation, June 1990 Order No. B2004 [12] Doduc, Nhuan: Fortran Execution Time Benchmark. Unpublished manuscript, version 45, available from ndoduc@framentec.fr