Values of Tiny Corp (George Hotz's company)
Last updated
Last updated
Discover more about
â‘ If XLA/PrimTorch is CISC. Tinygrad is RISC: CISC (Complex Instruction Set Computing) has a more complex instruction set where one instruction can perform many low-level operations. RISC (Reduced Instruction Set Computing) is smaller, where each instruction can only perform one low-level operation, leading to faster and more efficient instruction execution. If you've used Apple Silicon M1/M2, AMD Ryzen, or Raspberry Pi, you've used a RISC computer.
â‘¡ If you can't write a fast machine learning framework for GPUs, you can't write one for your own chips: There are many "AI chip" companies that all start with manufacturing chips. Companies like Cerebras are still building, while companies like Graphcore are in a difficult period. But just building a chip with higher TFLOPS isn't enough: "There's already a great chip on the market. For just $999, you can get a card with 123 TFLOPS and 24 GB 960 GB/s RAM. This is the chip with the highest FLOPS per dollar today, but no one uses it in machine learning." This refers to the AMD RX 7900 XTX. NVIDIA's lead comes not just from high-performance cards but also from a great developer platform, CUDA. Compared to development tools, the cost of starting with chip development is higher, so Tinycorp decided to start by writing a framework for existing hardware rather than manufacturing its own chips.
â‘¢ Turing completeness considerations are harmful: You can't infer their behavior once you call Turing complete kernels. They are more complex because they must be able to execute any instruction. To optimize the performance of Turing kernels, you can only rely on caching, scheduling warps, and branch prediction. Since neural networks only need ADD/MUL operations and rely on static memory access, Turing completeness is unnecessary. This design decision allows Tinygrad to optimize instructions at a lower level. You might have guessed CUDA is Turing complete; this is the main difference that Tinycorp hopes to exploit to be competitive.
AINUR and P1X's values believe that following and developing Tinygrad in the spirit of Web3 and the DAO way to counter Big Flat is necessary and feasible.
Although current traditional computing providers such as NV provide better and faster AI computing solutions based on the optimization of CUDA GPU software + hardware and are highly compatible with major AI computing frameworks like TensorFlow and PyTorch, such as a rich software ecosystem, for example, rich libraries and tools: NVIDIA provides a series of CUDA libraries, such as cuDNN, cuBLAS, TensorRT, etc., which are optimized for deep learning and can further improve GPU computing performance. In addition, CUDA also provides some debugging and performance analysis tools, such as nvprof, Nsight, etc., which are very useful for optimizing CUDA programs. This has led to the current training and optimization using such frameworks being more efficient, convenient, and improving.
However, traditional AI frameworks are usually more complex: frameworks like TensorFlow and PyTorch provide many functions and options, just like CISC provides many complex instructions. This makes these frameworks more potent in terms of functionality, but at the same time, it also makes their learning curve steeper, their codebase larger, and their running efficiency may not be as good as more streamlined frameworks. So, we believe that by adopting a DAO-based collaborative incentive mechanism, allowing more grassroots hardware to join AI computing use, and better leveraging the power of the open-source ecosystem, the design principle of tinygrad is simplicity and efficiency: tinygrad only provides a set of the most basic and critical operations, just like RISC only provides a collection of the most basic instructions. This makes the Tinygrad code base very streamlined, and also makes learning and using tinygrad simpler.