<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>152334H</title><link>https://152334H.github.io/</link><description>152334H Personal Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sat, 14 Feb 2026 00:00:00 +0800</lastBuildDate><atom:link href="https://152334H.github.io/index.xml" rel="self" type="application/rss+xml"/><item><title>Blog Refurbishment</title><link>https://152334H.github.io/blog/blog-refurbishment/</link><pubDate>Sat, 13 Aug 2022 05:30:32 +0100</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/blog-refurbishment/</guid><description>Under new management</description></item><item><title>Are exploits free?</title><link>https://152334H.github.io/blog/kctf-eval/</link><pubDate>Sat, 14 Feb 2026 00:00:00 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/kctf-eval/</guid><description>&lt;p>For various reasons, I had to develop a kernel n-day exploit last week.&lt;/p></description></item><item><title>Truncated Thoughts on 2025</title><link>https://152334H.github.io/blog/2026/</link><pubDate>Thu, 01 Jan 2026 00:00:00 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/2026/</guid><description>&lt;p>It has been, overall, a pretty shit year.&lt;/p></description></item><item><title>Time To Think</title><link>https://152334H.github.io/blog/time-to-think/</link><pubDate>Tue, 04 Feb 2025 01:02:03 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/time-to-think/</guid><description><![CDATA[<p>I don&rsquo;t remember what it&rsquo;s like to think with more than 60 seconds of context.</p>]]></description></item><item><title>Calculating the Cost of a Google Deepmind Paper</title><link>https://152334H.github.io/blog/scaling-exponents/</link><pubDate>Tue, 30 Jul 2024 00:00:00 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/scaling-exponents/</guid><description><![CDATA[<p>Recently, GDM released a great paper titled, <a href="https://arxiv.org/pdf/2407.05872" target="_blank" rel="noopener noreffer "><em>Scaling Exponents Across Parameterizations and Optimizers</em></a>, in which they conduct over 10,000 LLM training runs to obtain optimal hyperparameters under different regimes.</p>
<p>After reading it (it was great), I wanted to test my understanding of the paper by tallying up all experiments conducted within, calculating <strong>the total compute cost it would take to replicate the paper</strong>.</p>]]></description></item><item><title>DeepSeek Core Readings 0 - Coder</title><link>https://152334H.github.io/blog/deepseek-0/</link><pubDate>Sun, 30 Jun 2024 00:00:00 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/deepseek-0/</guid><description><![CDATA[<p><a href="https://arxiv.org/pdf/2401.14196" target="_blank" rel="noopener noreffer ">Paper</a> summary: 1.3B to 33B LLMs on 1/2T code tokens (87 langs) w/ FiM and 16K seqlen. Strong effort in constructing pretraining data from Github from scratch, with repository-level samples. Evals beat OSS code models solidly + GPT-3.5 a bit; Coder-7B &gt; CodeLlama-33B often.</p>
<p>They don&rsquo;t spend much effort on Instruction tuning. They commit a continued pretrain of DeepSeek LLM -&gt; Coder: I believe it underperforms; they don&rsquo;t.</p>]]></description></item><item><title>DeepSeek Core Readings 1 - LLM</title><link>https://152334H.github.io/blog/deepseek-1/</link><pubDate>Sun, 23 Jun 2024 00:00:00 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/deepseek-1/</guid><description><![CDATA[<p><a href="https://arxiv.org/abs/2401.02954" target="_blank" rel="noopener noreffer ">Paper</a> summary: LLaMA-like 7B/67B pretrain (Base) + SFT&amp;DPO (Chat). 2T tokens with strong CN/EN mix, &gt;1mil SFT examples. Well-executed exploration of scaling laws. Good details about evals and safety. Not much described about their actual data.</p>]]></description></item><item><title>Basic tips for remaining conscious</title><link>https://152334H.github.io/blog/remain-conscious/</link><pubDate>Fri, 12 Apr 2024 12:56:00 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/remain-conscious/</guid><description><![CDATA[<p>I don&rsquo;t think very often. These are some of the things I&rsquo;ve tried in the past to rectify that.</p>]]></description></item><item><title>2023</title><link>https://152334H.github.io/blog/2023/</link><pubDate>Sun, 31 Dec 2023 23:25:26 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/2023/</guid><description>&lt;p>Let&amp;rsquo;s keep this short.&lt;/p></description></item><item><title>Rough thoughts on Mixtral vs Open Source</title><link>https://152334H.github.io/blog/mixtral-vs-oss/</link><pubDate>Wed, 13 Dec 2023 20:12:34 +0800</pubDate><author>152334H</author><guid>https://152334H.github.io/blog/mixtral-vs-oss/</guid><description><![CDATA[<p>Here&rsquo;s a <strong>thesis</strong> (hypothesis, predicate, etc) to chew on:</p>
<blockquote>
<p>The mixture-of-experts paradigm is fundamentally a hinderance to open source development, and mixtral-8x5B+2B will be summarily supplanted by a dense model like <a href="https://twitter.com/futuristflower/status/1716555972452184463" target="_blank" rel="noopener noreffer ">llama3</a>/<a href="https://techcrunch.com/2023/11/09/theres-something-going-on-with-ai-startups-in-france/" target="_blank" rel="noopener noreffer ">mistral-70b</a>/yi/qwen/&hellip; in the near future.</p>
</blockquote>]]></description></item></channel></rss>