Grant A

For POC work in the AI+Web3 field, such as tuning and vertical model training.

1) Large language model:

Graphics card configuration

Large model fine-tuning

Large model inference

High configuration:

8 cards Nvidia 3090 (48G8=384G memory)

65B LLaMA(1)

70B LLaMA(2)

Standard configuration:

8 cards Nvidia 3090 (24G8=192G memory)*

65B LLaMA(1)

70B LLaMA(2)

4 cards Nvidia 3090(24G)(without Nvlink)

ChatGLM-6B

Vicuna-6B

Vicuna-13B

ChatGLM-6B

(Graphics card configuration Large model fine-tuning Large model inference High configuration: 8 cards Nvidia 3090 (48G8=384G memory)65B LLaMA(1) 65B LLaMA(1) 70B LLaMA(2) Standard configuration: 8 cards Nvidia 3090 (24G8=192G memory)* 65B LLaMA(1) 65B LLaMA(1) 70B LLaMA(2) 4 cards Nvidia 3090(24G)(without Nvlink) ChatGLM-6B Vicuna-6B Vicuna-13B ChatGLM-6B)

2) Text-to-Image service (stable diffusion):

1024*1024+ high-resolution facial features

Graphics card configuration

Memory usage

Image generation speed

Single card 3090 (24G)

11G

22s

Single card 2080TI (22G)

18G

42s

(1024*1024+ high-resolution facial features Graphics card configuration Memory usage Image generation speed Single card 3090(24G) 11G 22s Single card 2080TI(22G) 18G 42s)

CPU: AMD EPYC 7742 64-Core 2 cores

Memory: 2T
Storage: SATA 14T

※ Note on NV LINK: NV LINK is Nvidia's high-speed data interface, providing a data bus bandwidth equivalent to 4 times PCIe x4. The RTX 3090 introduced the NV LINK data bus interface to consumer-grade graphics cards, allowing two 3090s to connect via NV LINK and function as one graphics card, supplanting the older SLI setup. However, this feature was discontinued in the next-generation graphics card 4090, to promote professional AI acceleration cards specifically for AI training, such as the A100 (which supports 16 NV LINK connections) and the H100 (which supports 18 connections). AINUR's custom 8-card computing miner uses the PCIe x16 interface, providing a data bandwidth comparable to NV LINK, bypassing the 2-card NV LINK connection bottleneck of the 3090.

※ Note on the simplified gradient instruction set Tinygrad: Tinygrad introduced the simplified gradient instruction set to challenge Nvidia's monopoly. This story is reminiscent of the RISC vs. Intel CISC movement of the 1980s, which led to the rise of the internet and mobile internet. RISC eventually defeated Intel. George Hotz's Tinygrad, a simplified gradient instruction set, was intended to replicate this historical experience.

Deep learning relies on Nvidia's GPU hardware and the corresponding software architecture CUDA. Deep learning programming frameworks like PyTorch are also optimized based on CUDA. Nearly two thousand cores (which you can interpret as special programs) define them, tailored for the CUDA architecture, bearing a strong resemblance to Intel's past complex computer instruction set architecture. George Hotz's simplified gradient instruction (the essence of deep learning is gradient computation) aims to significantly simplify these computing cores, reducing them from thousands to just a few dozen which greatly simplifies the hardware complexity of gradient computation, providing a foundation for designing new simplified gradient instruction chips and giving new life to a sea of cheap GPUs.

Configuration list that can be used for corresponding model inference.

Model Configuration

Nvidia 3090 * 2(24G * 2)

Nvidia 2080Ti * 4(22G * 4)

Nvidia 3090 * 8(24G * 8)

Nvidia 3090 * 8(48G * 8)

Stable Diffusion

√

ChatGLM-6B

ChatGLM2-6B

√

Llama-6B Vicuna-6B

Llama2-7B

√

Llama-13B

Vicuna-13B

Llama2-13B

√

Llama-33B

Vicuna-33B

√

Llama-65B

Vicuna-65B

Llama2-70B

√

PreviousGrant Implementation Plan NextGrant B

Last updated 1 year ago

Grant A

For POC work in the AI+Web3 field, such as tuning and vertical model training.

1) Large language model:

Graphics card configuration

Large model fine-tuning

Large model inference

High configuration:

8 cards Nvidia 3090 (48G8=384G memory)

65B LLaMA(1)

70B LLaMA(2)

Standard configuration:

8 cards Nvidia 3090 (24G8=192G memory)*

65B LLaMA(1)

70B LLaMA(2)

4 cards Nvidia 3090(24G)(without Nvlink)

ChatGLM-6B

Vicuna-6B

Vicuna-13B

ChatGLM-6B

2) Text-to-Image service (stable diffusion):

1024*1024+ high-resolution facial features

Graphics card configuration

Memory usage

Image generation speed

Single card 3090 (24G)

11G

22s

Single card 2080TI (22G)

18G

42s

(1024*1024+ high-resolution facial features Graphics card configuration Memory usage Image generation speed Single card 3090(24G) 11G 22s Single card 2080TI(22G) 18G 42s)

CPU: AMD EPYC 7742 64-Core 2 cores

Memory: 2T
Storage: SATA 14T

Configuration list that can be used for corresponding model inference.

Model Configuration

Nvidia 3090 * 2(24G * 2)

Nvidia 2080Ti * 4(22G * 4)

Nvidia 3090 * 8(24G * 8)

Nvidia 3090 * 8(48G * 8)

Stable Diffusion

√

ChatGLM-6B

ChatGLM2-6B

√

Llama-6B Vicuna-6B

Llama2-7B

√

Llama-13B

Vicuna-13B

Llama2-13B

√

Llama-33B

Vicuna-33B

√

Llama-65B

Vicuna-65B

Llama2-70B

√

PreviousGrant Implementation Plan NextGrant B

Last updated 1 year ago