| Generated tok/s | TTFT (s) | AI Processor | CPU | Backend | Engine | AI Model | Precision | Quantization | Context | Prompt Tokens | Generated Tokens | Concurrency | Requests | AI Framework | AI Fr. Version | Operating System | Python Version | Profile | Benchmark Spec Version | SAIB Version | Status | Benchmark Date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 166.73 | 0.006 | Apple MPS | Apple M2 Pro | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 512 | 128 | 256 | 1 | 5 | pytorch-simple-transformer | 2.11.0 | macOS-26.3.1-arm64-arm-64bit | 3.10.20 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx512_c1_v2 | 2.0 | 0.6.0 | Known profile | June 1, 2026 |
| 213.03 | 0.006 | Apple MPS | Apple M2 Pro | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 128 | 256 | 1 | 30 | pytorch-simple-transformer | 2.11.0 | macOS-26.3.1-arm64-arm-64bit | 3.10.20 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx4096_c1_v2 | 2.1 | 0.7.0 | Known profile | June 2, 2026 |
| 1370.83 | 0.001 | NVIDIA RTX PRO 6000 Blackwell Server Edition | AMD EPYC 9355 32-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 128 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx4096_c1_v2 | 2.1 | 0.7.1 | Known profile | June 2, 2026 |
| 871.30 | 0.001 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8480+ | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 128 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-106-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p128_g256_ctx4096_c1_v2 | 2.1 | 0.7.1 | Known profile | June 2, 2026 |
| 122.95 | 0.008 | NVIDIA A100 80GB PCIe | AMD EPYC 7763 64-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-40-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.8.0 | Known profile | June 2, 2026 |
| 73.38 | 0.032 | NVIDIA A100 80GB PCIe | AMD EPYC 7763 64-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-40-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.8.0 | Known profile | June 2, 2026 |
| 275.07 | 0.003 | NVIDIA GeForce RTX 5090 | AMD EPYC 9354 32-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-63-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.8.0 | Known profile | June 3, 2026 |
| 134.23 | 0.028 | NVIDIA GeForce RTX 5090 | AMD EPYC 9354 32-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.8.0+cu128 | Linux-6.8.0-63-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.8.0 | Known profile | June 3, 2026 |
| 56.77 | 0.053 | NVIDIA GeForce RTX 5090 | AMD EPYC 9354 32-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-63-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.8.0 | Known profile | June 3, 2026 |
| 237.40 | 0.004 | NVIDIA H200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-53-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.8.0 | Known profile | June 3, 2026 |
| 140.60 | 0.013 | NVIDIA H200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-53-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.8.0 | Known profile | June 3, 2026 |
| 102.84 | 0.009 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.9.0 | Known profile | June 3, 2026 |
| 90.31 | 0.052 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.9.0 | Known profile | June 3, 2026 |
| 36.27 | 0.107 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.9.0 | Known profile | June 3, 2026 |
| 90.26 | 0.010 | NVIDIA A40 | Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-57-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 89.34 | 0.059 | NVIDIA A40 | Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-57-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 91.08 | 0.010 | NVIDIA A40 | Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-52-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 96.44 | 0.058 | NVIDIA A40 | Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-52-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 181.38 | 0.005 | NVIDIA GeForce RTX 4090 | AMD EPYC 7K62 48-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-124-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 78.42 | 0.033 | NVIDIA GeForce RTX 4090 | AMD EPYC 7K62 48-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-124-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 180.67 | 0.005 | NVIDIA GeForce RTX 4090 | AMD EPYC 7K62 48-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-124-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 78.68 | 0.033 | NVIDIA GeForce RTX 4090 | AMD EPYC 7K62 48-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-124-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 180.89 | 0.005 | NVIDIA GeForce RTX 4090 | AMD EPYC 7K62 48-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-124-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 78.34 | 0.033 | NVIDIA GeForce RTX 4090 | AMD EPYC 7K62 48-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-124-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 91.30 | 0.010 | NVIDIA A40 | Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-60-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 302.77 | 0.003 | NVIDIA RTX PRO 6000 Blackwell Server Edition | AMD EPYC 9355 32-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 260.50 | 0.018 | NVIDIA RTX PRO 6000 Blackwell Server Edition | AMD EPYC 9355 32-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 235.35 | 0.004 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8470 | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-90-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 184.68 | 0.013 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8470 | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-90-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 157.13 | 0.005 | NVIDIA L40S | AMD EPYC 9374F 32-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-110-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 156.42 | 0.023 | NVIDIA L40S | AMD EPYC 9374F 32-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-110-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 185.52 | 0.005 | NVIDIA GeForce RTX 4090 | AMD EPYC 7352 24-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-5.15.0-1059-oracle-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 76.95 | 0.032 | NVIDIA GeForce RTX 4090 | AMD EPYC 7352 24-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-5.15.0-1059-oracle-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 322.26 | 0.003 | NVIDIA B200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-90-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 123.84 | 0.010 | NVIDIA B200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.8.0+cu128 | Linux-6.8.0-90-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 52.53 | 0.017 | NVIDIA RTX A4000 | AMD EPYC 7453 28-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-52-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 83.92 | 0.102 | NVIDIA RTX A4000 | AMD EPYC 7453 28-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-6.8.0-52-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 118.11 | 0.040 | NVIDIA RTX PRO 6000 Blackwell Server Edition | AMD EPYC 9355 32-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 38.32 | 0.062 | NVIDIA GeForce RTX 4090 | AMD EPYC 7663 56-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-111-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.1 | 0.10.0 | Known profile | June 4, 2026 |
| 115.81 | 0.040 | NVIDIA RTX PRO 6000 Blackwell Server Edition | AMD EPYC 9355 32-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 31.62 | 0.065 | NVIDIA GeForce RTX 4090 | AMD EPYC 7543 32-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.4.1+cu124 | Linux-6.8.0-40-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 99.73 | 0.009 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 92.37 | 0.053 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 35.80 | 0.108 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 322.63 | 0.003 | NVIDIA B200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-90-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 175.69 | 0.010 | NVIDIA B200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.8.0+cu128 | Linux-6.8.0-90-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 83.94 | 0.021 | NVIDIA B200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-90-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 16.99 | 0.063 | NVIDIA B200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | huggingface-causal-fp8 | - | Qwen/Qwen3-1.7B | FP8 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal-fp8 | 2.8.0+cu128 | Linux-6.8.0-90-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 262.66 | 0.003 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8480+ | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 141.85 | 0.012 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8480+ | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 61.99 | 0.026 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8480+ | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 13.17 | 0.081 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8480+ | huggingface-causal-fp8 | - | Qwen/Qwen3-1.7B | FP8 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal-fp8 | 2.8.0+cu128 | Linux-6.8.0-106-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 276.98 | 0.003 | NVIDIA GeForce RTX 5090 | INTEL(R) XEON(R) GOLD 6530 | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-107-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 137.85 | 0.028 | NVIDIA GeForce RTX 5090 | INTEL(R) XEON(R) GOLD 6530 | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.8.0+cu128 | Linux-6.8.0-107-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 58.54 | 0.053 | NVIDIA GeForce RTX 5090 | INTEL(R) XEON(R) GOLD 6530 | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-107-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 11.72 | 0.092 | NVIDIA GeForce RTX 5090 | INTEL(R) XEON(R) GOLD 6530 | huggingface-causal-fp8 | - | Qwen/Qwen3-1.7B | FP8 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal-fp8 | 2.8.0+cu128 | Linux-6.8.0-107-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 193.12 | 0.005 | NVIDIA GeForce RTX 4090 | AMD EPYC 7763 64-Core Processor | pytorch-simple-transformer | - | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-111-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_simpletransformerlm_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 89.16 | 0.033 | NVIDIA GeForce RTX 4090 | AMD EPYC 7763 64-Core Processor | pytorch-kv-decoder | - | KVCacheDecoderLM-1B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | pytorch-kv-decoder | 2.8.0+cu128 | Linux-6.8.0-111-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-kv-decoder_kvcachedecoderlm-1b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 36.76 | 0.064 | NVIDIA GeForce RTX 4090 | AMD EPYC 7763 64-Core Processor | huggingface-causal | - | Qwen/Qwen3-1.7B | BF16 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal | 2.8.0+cu128 | Linux-6.8.0-111-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 8.50 | 0.126 | NVIDIA GeForce RTX 4090 | AMD EPYC 7763 64-Core Processor | huggingface-causal-fp8 | - | Qwen/Qwen3-1.7B | FP8 | none | 4096 | 2048 | 256 | 1 | 30 | huggingface-causal-fp8 | 2.8.0+cu128 | Linux-6.8.0-111-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_huggingface-causal-fp8_qwen/qwen3-1.7b_inference_p2048_g256_ctx4096_c1_v2 | 2.2 | 0.11.0 | Known profile | June 4, 2026 |
| 2059.80 | 0.302 | NVIDIA RTX 6000 Ada Generation | AMD EPYC 9654 96-Core Emb Processor | openai-compatible | - | Qwen/Qwen2.5-7B-Instruct | bf16 | none | 4096 | 256 | 256 | 64 | 384 | openai-compatible | 0.11.0 | Linux-6.8.0-111-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 2726.03 | 0.291 | NVIDIA RTX 6000 Ada Generation | AMD EPYC 9654 96-Core Emb Processor | openai-compatible | - | Qwen/Qwen2.5-7B-Instruct | fp8 | fp8 | 4096 | 256 | 256 | 64 | 384 | openai-compatible | 0.11.0 | Linux-6.8.0-111-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 4891.73 | 0.284 | NVIDIA B200 | Intel(R) Xeon(R) 6960P | openai-compatible | - | Qwen/Qwen2.5-7B-Instruct | bf16 | none | 4096 | 256 | 256 | 64 | 384 | openai-compatible | 0.11.0 | Linux-6.8.0-124-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 5406.31 | 0.291 | NVIDIA B200 | Intel(R) Xeon(R) 6960P | openai-compatible | - | Qwen/Qwen2.5-7B-Instruct | fp8 | fp8 | 4096 | 256 | 256 | 64 | 384 | openai-compatible | 0.11.0 | Linux-6.8.0-124-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 418.83 | 0.017 | NVIDIA B200 | Intel(R) Xeon(R) 6960P | pytorch-simple-transformer | pytorch | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 8 | 96 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-124-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 307.58 | 0.020 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8480+ | pytorch-simple-transformer | pytorch | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 8 | 96 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-106-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 209.19 | 0.033 | NVIDIA GeForce RTX 4090 | AMD EPYC 7542 32-Core Processor | pytorch-simple-transformer | pytorch | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 8 | 96 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-6.8.0-64-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 618.24 | 0.060 | NVIDIA H100 80GB HBM3 | Intel(R) Xeon(R) Platinum 8480+ | huggingface-causal | pytorch | Qwen/Qwen2.5-0.5B-Instruct | BF16 | none | 4096 | 2048 | 256 | 8 | 96 | huggingface-causal | 2.4.1+cu124 | Linux-6.8.0-106-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_huggingface-causal_pytorch_qwen/qwen2.5-0.5b-instruct_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 381.68 | 0.166 | NVIDIA GeForce RTX 4090 | AMD EPYC 7542 32-Core Processor | huggingface-causal | pytorch | Qwen/Qwen2.5-0.5B-Instruct | BF16 | none | 4096 | 2048 | 256 | 8 | 96 | huggingface-causal | 2.4.1+cu124 | Linux-6.8.0-64-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_huggingface-causal_pytorch_qwen/qwen2.5-0.5b-instruct_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 109.65 | 0.064 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | pytorch-simple-transformer | pytorch | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 8 | 96 | pytorch-simple-transformer | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 395.77 | 0.249 | NVIDIA RTX A6000 | AMD EPYC 7543 32-Core Processor | huggingface-causal | pytorch | Qwen/Qwen2.5-0.5B-Instruct | BF16 | none | 4096 | 2048 | 256 | 8 | 96 | huggingface-causal | 2.4.1+cu124 | Linux-5.15.0-139-generic-x86_64-with-glibc2.35 | 3.11.10 | llm_huggingface-causal_pytorch_qwen/qwen2.5-0.5b-instruct_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 2194.91 | 0.238 | NVIDIA RTX 6000 Ada Generation | AMD EPYC 75F3 32-Core Processor | openai-compatible | vllm | Qwen/Qwen2.5-7B-Instruct | bf16 | none | 4096 | 256 | 256 | 64 | 384 | openai-compatible | 0.11.0 | Linux-6.8.0-52-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 3090.77 | 0.228 | NVIDIA RTX 6000 Ada Generation | AMD EPYC 75F3 32-Core Processor | openai-compatible | vllm | Qwen/Qwen2.5-7B-Instruct | fp8 | fp8 | 4096 | 256 | 256 | 64 | 384 | openai-compatible | 0.11.0 | Linux-6.8.0-52-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_openai-compatible_vllm_qwen/qwen2.5-7b-instruct_inference_p256_g256_ctx4096_c64_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 418.25 | 0.017 | NVIDIA B200 | INTEL(R) XEON(R) PLATINUM 8568Y+ | pytorch-simple-transformer | pytorch | SimpleTransformerLM | FP32 | none | 4096 | 2048 | 256 | 8 | 96 | pytorch-simple-transformer | 2.8.0+cu128 | Linux-6.8.0-90-generic-x86_64-with-glibc2.39 | 3.12.3 | llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p2048_g256_ctx4096_c8_v2 | 2.3 | 0.17.1 | Known profile | June 6, 2026 |
| 316.21 | 0.009 | Apple MPS | Apple M2 Pro | pytorch-simple-transformer | pytorch | SimpleTransformerLM | FP32 | none | 512 | 128 | 32 | 2 | 8 | pytorch-simple-transformer | 2.11.0 | macOS-26.3.1-arm64-arm-64bit | 3.10.20 | llm_pytorch-simple-transformer_pytorch_simpletransformerlm_inference_p128_g32_ctx512_c2_v2 | 2.3 | 0.17.1 | Known profile | June 7, 2026 |