


We can predict that RWKV 100B will be great, and RWKV 1T is probably all you need :) This debunks the old view that RNNs cannot model long ctxlens. RWKV 1B5-4k is mostly flat after ctx1500, but 3B-4k and 7B-4k and 14B-4k have some slopes, and they are getting better. We have plenty of potential compute (A100 40Gs) now (thanks to Stability and EleutherAI), so if you have interesting ideas I can run them. You are welcome to join the RWKV discord to build upon it. RWKV introduction, and in 100 lines of numpy: Ī cool paper (Spiking Neural Network) using RWKV: RWKV in 150 lines (model, inference, text generation): More RWKV projects: Join Our Discord: (lots of developers) Fastest GPU inference API with vulkan (good for nvidia/amd/intel) Fast CPU/cuBLAS/CLBlast inference: int4/int8/fp16/fp32 numpy()) # same result as aboveĬool Community RWKV Projects (check them!): forward(, state) # RNN has state (use deepcopy if you want to clone it) out, state = model. numpy()) # get logits out, state = model. forward(, None) # use 20B_tokenizer.json print( out.

model import RWKV # pip install rwkv model = RWKV( model = '/fsx/BlinkDL/HF-MODEL/rwkv-4-pile-1b5/RWKV-4-Pile-1B5-20220903-8040', strategy = 'cuda fp16') environ = '0' # if '1' then use CUDA kernel for seq mode (much faster) from rwkv. RWKV-4-World is the best model: generation & chat & code in 100+ world languages, with the best English zero-shot & in-context learning ability too. Raven 14B (finetuned on Alpaca+ShareGPT+.) Demo: So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state). You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode. You only need the hidden state at position t to compute the state at position t+1. RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). The RWKV Language Model (and my LM tricks) RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
