Research Index
My current writing is focused on large language models, reasoning, retrieval, model architecture, evaluation, and the operational systems around AI.
Focus Areas
AI
18 articles in this area.
NLP
14 articles in this area.
Novel Research
14 articles in this area.
Technical
4 articles in this area.
Latest Research Notes
Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Introduction Background Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities, known as catastrophic forgetting. This phenomenon has been observed in...
Original SourceAgent-based Automated Claim Matching with Instruction-following LLMs
Introduction Background Automated fact-checking pipelines rely on claim matching to identify claims that can be verified using the same evidence or fact-check. This task is crucial for scaling...
Original SourceThe 10,000x Explosion: Reproducing DeepSeek’s mHC at Scale
The 10,000x Explosion: Reproducing DeepSeek’s mHC at Scale
mHC: Manifold-Constrained Hyper-Connections
Introduction Background Deep neural network architectures have evolved significantly since the introduction of ResNets in 2016, with residual connections becoming a cornerstone of modern models like Transformers and...
Original SourceConfTuner: Training Large Language Models to Express Their Confidence Verbally
Introduction Background Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare, where accurate expressions of uncertainty are essential for reliability and...
Original SourceThe Zero Temperature Myth: Why "Greedy" Doesn't Always Mean "Same"
The Zero Temperature Myth: Why “Greedy” Doesn’t Always Mean “Same”
Collaboration
I am open to collaboration on AI, machine learning, and software engineering research. Reach me at weiherng1208@gmail.com, or browse the full archive and topic map.