AI Benchmark Database - LLM Results
Generated tok/s TTFT (s) AI Processor CPU Backend Engine AI Model Precision Quantization Context Prompt Tokens Generated Tokens Concurrency Requests AI Framework AI Fr. Version Operating System Python Version Profile Benchmark Spec Version SAIB Version Status Benchmark Date
166.73 0.006 Apple MPS Apple M2 Pro pytorch-simple-transformer - SimpleTransformerLM FP32 none 512 128 256 1 5 pytorch-simple-transformer 2.11.0 macOS-26.3.1-arm64-arm-64bit 3.10.20 llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx512_c1_v2 2.0 0.6.0 Known profile June 1, 2026
213.03 0.006 Apple MPS Apple M2 Pro pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 128 256 1 30 pytorch-simple-transformer 2.11.0 macOS-26.3.1-arm64-arm-64bit 3.10.20 llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx4096_c1_v2 2.1 0.7.0 Known profile June 2, 2026
1370.83 0.001 NVIDIA RTX PRO 6000 Blackwell Server Edition AMD EPYC 9355 32-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 128 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx4096_c1_v2 2.1 0.7.1 Known profile June 2, 2026
871.30 0.001 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8480+ pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 128 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-106-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx4096_c1_v2 2.1 0.7.1 Known profile June 2, 2026
122.95 0.008 NVIDIA A100 80GB PCIe AMD EPYC 7763 64-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-40-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.8.0 Known profile June 2, 2026
73.38 0.032 NVIDIA A100 80GB PCIe AMD EPYC 7763 64-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-40-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.8.0 Known profile June 2, 2026
275.07 0.003 NVIDIA GeForce RTX 5090 AMD EPYC 9354 32-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-63-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.8.0 Known profile June 3, 2026
134.23 0.028 NVIDIA GeForce RTX 5090 AMD EPYC 9354 32-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.8.0+cu128 Linux-6.8.0-63-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.8.0 Known profile June 3, 2026
56.77 0.053 NVIDIA GeForce RTX 5090 AMD EPYC 9354 32-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-63-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.8.0 Known profile June 3, 2026
237.40 0.004 NVIDIA H200 INTEL(R) XEON(R) PLATINUM 8568Y+ pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-53-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.8.0 Known profile June 3, 2026
140.60 0.013 NVIDIA H200 INTEL(R) XEON(R) PLATINUM 8568Y+ pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-53-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.8.0 Known profile June 3, 2026
102.84 0.009 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.9.0 Known profile June 3, 2026
90.31 0.052 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.9.0 Known profile June 3, 2026
36.27 0.107 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.9.0 Known profile June 3, 2026
90.26 0.010 NVIDIA A40 Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-57-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
89.34 0.059 NVIDIA A40 Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-57-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
91.08 0.010 NVIDIA A40 Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-52-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
96.44 0.058 NVIDIA A40 Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-52-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
181.38 0.005 NVIDIA GeForce RTX 4090 AMD EPYC 7K62 48-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-124-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
78.42 0.033 NVIDIA GeForce RTX 4090 AMD EPYC 7K62 48-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-124-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
180.67 0.005 NVIDIA GeForce RTX 4090 AMD EPYC 7K62 48-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-124-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
78.68 0.033 NVIDIA GeForce RTX 4090 AMD EPYC 7K62 48-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-124-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
180.89 0.005 NVIDIA GeForce RTX 4090 AMD EPYC 7K62 48-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-124-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
78.34 0.033 NVIDIA GeForce RTX 4090 AMD EPYC 7K62 48-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-124-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
91.30 0.010 NVIDIA A40 Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-60-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
302.77 0.003 NVIDIA RTX PRO 6000 Blackwell Server Edition AMD EPYC 9355 32-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
260.50 0.018 NVIDIA RTX PRO 6000 Blackwell Server Edition AMD EPYC 9355 32-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
235.35 0.004 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8470 pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-90-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
184.68 0.013 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8470 pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-90-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
157.13 0.005 NVIDIA L40S AMD EPYC 9374F 32-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-110-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
156.42 0.023 NVIDIA L40S AMD EPYC 9374F 32-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-110-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
185.52 0.005 NVIDIA GeForce RTX 4090 AMD EPYC 7352 24-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-5.15.0-1059-oracle-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
76.95 0.032 NVIDIA GeForce RTX 4090 AMD EPYC 7352 24-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-5.15.0-1059-oracle-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
322.26 0.003 NVIDIA B200 INTEL(R) XEON(R) PLATINUM 8568Y+ pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-90-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
123.84 0.010 NVIDIA B200 INTEL(R) XEON(R) PLATINUM 8568Y+ pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.8.0+cu128 Linux-6.8.0-90-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
52.53 0.017 NVIDIA RTX A4000 AMD EPYC 7453 28-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-52-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
83.92 0.102 NVIDIA RTX A4000 AMD EPYC 7453 28-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-6.8.0-52-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
118.11 0.040 NVIDIA RTX PRO 6000 Blackwell Server Edition AMD EPYC 9355 32-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
38.32 0.062 NVIDIA GeForce RTX 4090 AMD EPYC 7663 56-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-111-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.1 0.10.0 Known profile June 4, 2026
115.81 0.040 NVIDIA RTX PRO 6000 Blackwell Server Edition AMD EPYC 9355 32-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
31.62 0.065 NVIDIA GeForce RTX 4090 AMD EPYC 7543 32-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.4.1+cu124 Linux-6.8.0-40-generic-x86_64-with-glibc2.35 3.11.10 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
99.73 0.009 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
92.37 0.053 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
35.80 0.108 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
322.63 0.003 NVIDIA B200 INTEL(R) XEON(R) PLATINUM 8568Y+ pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-90-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
175.69 0.010 NVIDIA B200 INTEL(R) XEON(R) PLATINUM 8568Y+ pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.8.0+cu128 Linux-6.8.0-90-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
83.94 0.021 NVIDIA B200 INTEL(R) XEON(R) PLATINUM 8568Y+ huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-90-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
16.99 0.063 NVIDIA B200 INTEL(R) XEON(R) PLATINUM 8568Y+ huggingface-causal-fp8 - Qwen/Qwen3-1.7B FP8 none 4096 2048 256 1 30 huggingface-causal-fp8 2.8.0+cu128 Linux-6.8.0-90-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
262.66 0.003 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8480+ pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
141.85 0.012 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8480+ pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
61.99 0.026 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8480+ huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
13.17 0.081 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8480+ huggingface-causal-fp8 - Qwen/Qwen3-1.7B FP8 none 4096 2048 256 1 30 huggingface-causal-fp8 2.8.0+cu128 Linux-6.8.0-106-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
276.98 0.003 NVIDIA GeForce RTX 5090 INTEL(R) XEON(R) GOLD 6530 pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-107-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
137.85 0.028 NVIDIA GeForce RTX 5090 INTEL(R) XEON(R) GOLD 6530 pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.8.0+cu128 Linux-6.8.0-107-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
58.54 0.053 NVIDIA GeForce RTX 5090 INTEL(R) XEON(R) GOLD 6530 huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-107-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
11.72 0.092 NVIDIA GeForce RTX 5090 INTEL(R) XEON(R) GOLD 6530 huggingface-causal-fp8 - Qwen/Qwen3-1.7B FP8 none 4096 2048 256 1 30 huggingface-causal-fp8 2.8.0+cu128 Linux-6.8.0-107-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
193.12 0.005 NVIDIA GeForce RTX 4090 AMD EPYC 7763 64-Core Processor pytorch-simple-transformer - SimpleTransformerLM FP32 none 4096 2048 256 1 30 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-111-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
89.16 0.033 NVIDIA GeForce RTX 4090 AMD EPYC 7763 64-Core Processor pytorch-kv-decoder - KVCacheDecoderLM-1B BF16 none 4096 2048 256 1 30 pytorch-kv-decoder 2.8.0+cu128 Linux-6.8.0-111-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
36.76 0.064 NVIDIA GeForce RTX 4090 AMD EPYC 7763 64-Core Processor huggingface-causal - Qwen/Qwen3-1.7B BF16 none 4096 2048 256 1 30 huggingface-causal 2.8.0+cu128 Linux-6.8.0-111-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
8.50 0.126 NVIDIA GeForce RTX 4090 AMD EPYC 7763 64-Core Processor huggingface-causal-fp8 - Qwen/Qwen3-1.7B FP8 none 4096 2048 256 1 30 huggingface-causal-fp8 2.8.0+cu128 Linux-6.8.0-111-generic-x86_64-with-glibc2.39 3.12.3 llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 2.2 0.11.0 Known profile June 4, 2026
2059.80 0.302 NVIDIA RTX 6000 Ada Generation AMD EPYC 9654 96-Core Emb Processor openai-compatible - Qwen/Qwen2.5-7B-Instruct bf16 none 4096 256 256 64 384 openai-compatible 0.11.0 Linux-6.8.0-111-generic-x86_64-with-glibc2.39 3.12.3 llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 2.3 0.17.1 Known profile June 6, 2026
2726.03 0.291 NVIDIA RTX 6000 Ada Generation AMD EPYC 9654 96-Core Emb Processor openai-compatible - Qwen/Qwen2.5-7B-Instruct fp8 fp8 4096 256 256 64 384 openai-compatible 0.11.0 Linux-6.8.0-111-generic-x86_64-with-glibc2.39 3.12.3 llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 2.3 0.17.1 Known profile June 6, 2026
4891.73 0.284 NVIDIA B200 Intel(R) Xeon(R) 6960P openai-compatible - Qwen/Qwen2.5-7B-Instruct bf16 none 4096 256 256 64 384 openai-compatible 0.11.0 Linux-6.8.0-124-generic-x86_64-with-glibc2.39 3.12.3 llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 2.3 0.17.1 Known profile June 6, 2026
5406.31 0.291 NVIDIA B200 Intel(R) Xeon(R) 6960P openai-compatible - Qwen/Qwen2.5-7B-Instruct fp8 fp8 4096 256 256 64 384 openai-compatible 0.11.0 Linux-6.8.0-124-generic-x86_64-with-glibc2.39 3.12.3 llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 2.3 0.17.1 Known profile June 6, 2026
418.83 0.017 NVIDIA B200 Intel(R) Xeon(R) 6960P pytorch-simple-transformer pytorch SimpleTransformerLM FP32 none 4096 2048 256 8 96 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-124-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
307.58 0.020 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8480+ pytorch-simple-transformer pytorch SimpleTransformerLM FP32 none 4096 2048 256 8 96 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-106-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
209.19 0.033 NVIDIA GeForce RTX 4090 AMD EPYC 7542 32-Core Processor pytorch-simple-transformer pytorch SimpleTransformerLM FP32 none 4096 2048 256 8 96 pytorch-simple-transformer 2.4.1+cu124 Linux-6.8.0-64-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
618.24 0.060 NVIDIA H100 80GB HBM3 Intel(R) Xeon(R) Platinum 8480+ huggingface-causal pytorch Qwen/Qwen2.5-0.5B-Instruct BF16 none 4096 2048 256 8 96 huggingface-causal 2.4.1+cu124 Linux-6.8.0-106-generic-x86_64-with-glibc2.35 3.11.10 llm_huggingface-causal_pytorch_qwen/qwen2.5-0.5b-instruct_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
381.68 0.166 NVIDIA GeForce RTX 4090 AMD EPYC 7542 32-Core Processor huggingface-causal pytorch Qwen/Qwen2.5-0.5B-Instruct BF16 none 4096 2048 256 8 96 huggingface-causal 2.4.1+cu124 Linux-6.8.0-64-generic-x86_64-with-glibc2.35 3.11.10 llm_huggingface-causal_pytorch_qwen/qwen2.5-0.5b-instruct_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
109.65 0.064 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor pytorch-simple-transformer pytorch SimpleTransformerLM FP32 none 4096 2048 256 8 96 pytorch-simple-transformer 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
395.77 0.249 NVIDIA RTX A6000 AMD EPYC 7543 32-Core Processor huggingface-causal pytorch Qwen/Qwen2.5-0.5B-Instruct BF16 none 4096 2048 256 8 96 huggingface-causal 2.4.1+cu124 Linux-5.15.0-139-generic-x86_64-with-glibc2.35 3.11.10 llm_huggingface-causal_pytorch_qwen/qwen2.5-0.5b-instruct_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
2194.91 0.238 NVIDIA RTX 6000 Ada Generation AMD EPYC 75F3 32-Core Processor openai-compatible vllm Qwen/Qwen2.5-7B-Instruct bf16 none 4096 256 256 64 384 openai-compatible 0.11.0 Linux-6.8.0-52-generic-x86_64-with-glibc2.39 3.12.3 llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 2.3 0.17.1 Known profile June 6, 2026
3090.77 0.228 NVIDIA RTX 6000 Ada Generation AMD EPYC 75F3 32-Core Processor openai-compatible vllm Qwen/Qwen2.5-7B-Instruct fp8 fp8 4096 256 256 64 384 openai-compatible 0.11.0 Linux-6.8.0-52-generic-x86_64-with-glibc2.39 3.12.3 llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 2.3 0.17.1 Known profile June 6, 2026
418.25 0.017 NVIDIA B200 INTEL(R) XEON(R) PLATINUM 8568Y+ pytorch-simple-transformer pytorch SimpleTransformerLM FP32 none 4096 2048 256 8 96 pytorch-simple-transformer 2.8.0+cu128 Linux-6.8.0-90-generic-x86_64-with-glibc2.39 3.12.3 llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 2.3 0.17.1 Known profile June 6, 2026
316.21 0.009 Apple MPS Apple M2 Pro pytorch-simple-transformer pytorch SimpleTransformerLM FP32 none 512 128 32 2 8 pytorch-simple-transformer 2.11.0 macOS-26.3.1-arm64-arm-64bit 3.10.20 llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p128_g32_ctx512_c2_v2 2.3 0.17.1 Known profile June 7, 2026