Blog Refurbishment📌

152334H 794 words 4 minutes published on August 13, 2022 included in tech

Under new management

DeepSeek Core Readings 0 - Coder

152334H 1236 words 6 minutes published on June 30, 2024 included in tech

Paper summary: 1.3B to 33B LLMs on 1/2T code tokens (87 langs) w/ FiM and 16K seqlen. Strong effort in constructing pretraining data from Github from scratch, with repository-level samples. Evals beat OSS code models solidly + GPT-3.5 a bit; Coder-7B > CodeLlama-33B often.

They don’t spend much effort on Instruction tuning. They commit a continued pretrain of DeepSeek LLM -> Coder: I believe it underperforms; they don’t.

DeepSeek Core Readings 1 - LLM

152334H 1552 words 8 minutes published on June 23, 2024 included in tech

Paper summary: LLaMA-like 7B/67B pretrain (Base) + SFT&DPO (Chat). 2T tokens with strong CN/EN mix, >1mil SFT examples. Well-executed exploration of scaling laws. Good details about evals and safety. Not much described about their actual data.

Basic tips for remaining conscious

152334H 277 words 2 minutes published on April 12, 2024

I don’t think very often. These are some of the things I’ve tried in the past to rectify that.

2023

152334H 577 words 3 minutes published on December 31, 2023

Let’s keep this short.