DeepSeek Core Readings

Status: In progress

These are a set of personal notes about the deepseek core readings (extended) (elab).

https://pbs.twimg.com/media/GQzX36eXIAANDLR?format=jpg&name=4096x4096

They are not meant for mass public consumption (though you are free to read/cite), as I will only be noting down information that I care about.

LinksPostTL;DR
2401.14196 RepoDeepSeek CoderV1 1.3/6.7/33B Code models. Evals beat other OSS + GPT-3.5. Arguably bad post-training + continued-pretraining.
2401.02954 RepoDeepSeek LLMV1 7B/67B Base/Chat models. Great details about scaling laws, thoughtful evaluation/alignment.
2401.06066 RepoDeepSeek MoE
2402.03300 RepoDeepSeek Math
2403.05525 RepoDeepSeek VL
paper.pdf RepoDeepSeek V2
paper.pdf RepoDeepSeek Coder V2.

DeepSeek Core Readings 0 - Coder

Paper summary: 1.3B to 33B LLMs on 1/2T code tokens (87 langs) w/ FiM and 16K seqlen. Strong effort in constructing pretraining data from Github from scratch, with repository-level samples. Evals beat OSS code models solidly + GPT-3.5 a bit; Coder-7B > CodeLlama-33B often.

They don’t spend much effort on Instruction tuning. They commit a continued pretrain of DeepSeek LLM -> Coder: I believe it underperforms; they don’t.


I will be skipping the following papers:

PaperReason
DeepSeek ProverNo Code No Weights No Data. Also not really interested in LEAN-related LLM projects.