Exploring the
Frontiers of AI
Deep dives into machine learning, neural architectures, and the future of artificial intelligence.
Latest Post
Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Introduction Background Adapting language models (LMs) to new tasks via post-training carries the risk of degrading existing capabilities, known as catastrophic forgetting. This phenomenon has been observed in both supervised fine-tuning (SFT) for instruction following and reinforcement learning (RL) for preference alignment. However, the comparative...
Latest Publications
Agent-based Automated Claim Matching with Instruction-following LLMs
Introduction Background Automated fact-checking pipelines rely on claim matching to identify claims that can be verified using the same evidence or fact-check. This task is crucial for scaling fact-checking efforts, as it helps in grouping...
The 10,000x Explosion: Reproducing DeepSeek’s mHC at Scale
The 10,000x Explosion: Reproducing DeepSeek’s mHC at Scale
mHC: Manifold-Constrained Hyper-Connections
Introduction Background Deep neural network architectures have evolved significantly since the introduction of ResNets in 2016, with residual connections becoming a cornerstone of modern models like Transformers and large language models (LLMs). Hyper-Connections (HC) extended...
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Introduction Background Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare, where accurate expressions of uncertainty are essential for reliability and trust. However, current LLMs often generate incorrect...
The Zero Temperature Myth: Why "Greedy" Doesn't Always Mean "Same"
The Zero Temperature Myth: Why “Greedy” Doesn’t Always Mean “Same”
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
Introduction Background Large language models (LLMs) excel at generating high-quality, fluent text but often produce hallucinations—statements that misalign with established world knowledge or provided input context. Measuring hallucinations is challenging due to the open-ended nature...
Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training
Introduction Background Diffusion models have revolutionized generative AI, achieving state-of-the-art performance in tasks like image, audio, and video generation. However, a key challenge is understanding why they generalize well without memorizing training data, despite being...
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Introduction Background Large Language Models (LLMs) have shown impressive capabilities in complex reasoning tasks through Chain-of-Thought (CoT) prompting, which generates intermediate reasoning steps in natural language. However, standard CoT is constrained to discrete token embeddings...
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Introduction Background Recent breakthroughs in reasoning-centric Large Language Models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have been largely driven by Reinforcement Learning with Verifiable Rewards (RLVR). In this paradigm, models are trained on tasks like...
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Introduction Background Scaling model size has been a highly effective recipe in many areas of machine learning, driven by breakthroughs in language and vision models like Llama 3 and Stable Diffusion. However, comparable progress in...
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Code: https://github.com/liweijiang/artificial-hivemind Dataset: INFINITY-CHAT Collection
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Introduction Background Gating mechanism is widely used in neural network since early models like LSTM and Highway Networks to control the information loss across time steps or layers. This extends to modern architectures such as...
LightMem: Lightweight and Efficient Memory-Augmented Generation
Introduction Background Memory is the fundamental to intelligent system to obtain prior experience/contextual cues/task-specific knowledge in order to perform robust reasoning/decision-making.
The Cockpit of AI: A Beginner’s Guide to LLM Parameters
When you use an LLM (Large Language Model) through an API like OpenRouter, you aren’t just sending a text message and hoping for the best. You actually have access to a “cockpit” of dials and...
Inside vLLM: How This Amazing Engine Makes Large Models Lightning Fast
🚀 Supercharging AI: A Simple Guide to How vLLM Works
BriLLM: Brain-inspired Large Language Model
Introduction Background Bottlenecks of Artificial General Intelligence (AGI): Disconnection between language models and world models Limitations of Transformer-based architectures in conventional representation learning
Revisiting Long-Context Modeling From Context Denoising Perspective
Introduction Background The advancement of long-context model (LCM) has emerged significantly to handle up to millions of tokens. However, some researchers found out the problem of LCM to be impacted by contextual noise such as...