Grant C
Large-scale reorganization of the 3090 server cluster, and further based on the Tiny Grad architecture:
738 FP16 TFLOPS floating-point operations per second
144 GB GPU RAM
5.76 TB/s RAM bandwidth
30 GB/s model loading bandwidth (large Llama model loads in about 4 seconds)
AMD EPYC CPU
1600W power (a 120V socket)
Out-of-the-box operation of the 65B parameter FP16 LLaMA model (based on Tinygrad)
Last updated