Podcast Episode
The disclosure came through DeepSeek's FlashMLA repository, which houses the company's efficient Multi Head Latent Attention decoding kernel optimized for Nvidia Hopper GPUs. According to posts on Reddit's LocalLLaMA community, the FlashMLA source code underwent an update that added extensive support for MODEL1, including compatibility with Nvidia's upcoming Blackwell architecture alongside current Hopper chips.
The code changes reportedly show MODEL1 reverting to a unified 512 standard dimension and introducing features described as Value Vector Position Awareness, along with what may be implementations of DeepSeek's recently published Engram conditional memory system.
The main model stays on the GPU, but a large chunk of stored information is offloaded into separate memory on regular RAM. During inference, the system can asynchronously retrieve embeddings from host CPU memory via PCIe while the GPU computes preceding transformer blocks. Researchers demonstrated this with a 100 billion parameter embedding table entirely offloaded to host DRAM, achieving throughput penalties below 3 percent.
Through systematic experiments, DeepSeek found the optimal balance with 75 percent of sparse model capacity allocated to dynamic reasoning and 25 percent to static lookups. Complex reasoning benchmarks jumped from 70 percent to 74 percent accuracy, while knowledge focused tests improved from 57 percent to 61 percent. This enables efficient retrieval from contexts exceeding one million tokens by creating a lookup system for foundational facts rather than recalculating them through computation.
The V4 model is expected to integrate DeepSeek's newly published Engram architecture. The anticipated open source release would make V4 one of the most capable freely available coding models, continuing DeepSeek's pattern of matching proprietary performance at dramatically lower inference costs. DeepSeek V4 is reportedly designed to run on consumer grade hardware, including dual Nvidia RTX 4090s or a single RTX 5090.
The big question remains whether MODEL1 and V4 are the same project or if DeepSeek is preparing to release two separate models. DeepSeek has not officially commented on MODEL1 or confirmed specific release timing for V4. The company's track record of achieving state of the art performance at dramatically reduced computational costs continues to challenge assumptions about the resources required for frontier AI development.
DeepSeek Reveals Mystery MODEL1 on R1 Anniversary, Signals Major AI Architecture Shift
January 21, 2026
Audio archived. Episodes older than 60 days are removed to save server storage. Story details remain below.
On the first anniversary of its market-rattling R1 release, Chinese AI startup DeepSeek has revealed a new model called MODEL1 through updates to its FlashMLA code repository on GitHub, where the mysterious identifier appears 28 times across 114 files. The disclosure came exactly one year after DeepSeek's R1 debut triggered what venture capitalist Marc Andreessen called AI's Sputnik moment, wiping 593 billion dollars from Nvidia's market value in a single day as investors questioned whether billions being spent on AI infrastructure could achieve similar results at dramatically lower cost.
Technical Architecture Breakthrough
Developers who analyzed the updated codebase found that MODEL1 represents a new architecture distinct from DeepSeek V3.2, codenamed V32 in the repository. The differences in code logic point to changes in key value cache layout, sparsity handling, and FP8 data format decoding, suggesting the company has undertaken targeted restructuring for improved memory optimization and computational efficiency.The disclosure came through DeepSeek's FlashMLA repository, which houses the company's efficient Multi Head Latent Attention decoding kernel optimized for Nvidia Hopper GPUs. According to posts on Reddit's LocalLLaMA community, the FlashMLA source code underwent an update that added extensive support for MODEL1, including compatibility with Nvidia's upcoming Blackwell architecture alongside current Hopper chips.
The code changes reportedly show MODEL1 reverting to a unified 512 standard dimension and introducing features described as Value Vector Position Awareness, along with what may be implementations of DeepSeek's recently published Engram conditional memory system.
Engram Conditional Memory Innovation
DeepSeek released a technical paper on January 12, 2026, detailing Engram, a conditional memory based technique that separates static pattern retrieval from dynamic reasoning. The system first queries a lookup table, checking whether similar knowledge or patterns have been previously stored. If a match is found, the model can directly retrieve this information via O(1) lookups rather than recomputing through neural networks.The main model stays on the GPU, but a large chunk of stored information is offloaded into separate memory on regular RAM. During inference, the system can asynchronously retrieve embeddings from host CPU memory via PCIe while the GPU computes preceding transformer blocks. Researchers demonstrated this with a 100 billion parameter embedding table entirely offloaded to host DRAM, achieving throughput penalties below 3 percent.
Through systematic experiments, DeepSeek found the optimal balance with 75 percent of sparse model capacity allocated to dynamic reasoning and 25 percent to static lookups. Complex reasoning benchmarks jumped from 70 percent to 74 percent accuracy, while knowledge focused tests improved from 57 percent to 61 percent. This enables efficient retrieval from contexts exceeding one million tokens by creating a lookup system for foundational facts rather than recalculating them through computation.
V4 Model Expected in February
Reports indicate DeepSeek plans to release its next generation V4 model around mid February 2026, timed to coincide with Lunar New Year on February 17. Internal tests by DeepSeek employees reportedly suggest V4 could outperform rivals from Anthropic and OpenAI on coding benchmarks, particularly when handling extremely long code prompts.The V4 model is expected to integrate DeepSeek's newly published Engram architecture. The anticipated open source release would make V4 one of the most capable freely available coding models, continuing DeepSeek's pattern of matching proprietary performance at dramatically lower inference costs. DeepSeek V4 is reportedly designed to run on consumer grade hardware, including dual Nvidia RTX 4090s or a single RTX 5090.
The R1 Legacy and Cost Efficiency
DeepSeek's R1 model reportedly cost under 6 million dollars to train, a fraction of the billions invested by Silicon Valley competitors, while matching or exceeding OpenAI's o1 model on math and coding benchmarks. The Chinese firm has since released V3.1 in August and V3.2 in December, with the latter described as offering equivalent performance to OpenAI's GPT 4 level models.The big question remains whether MODEL1 and V4 are the same project or if DeepSeek is preparing to release two separate models. DeepSeek has not officially commented on MODEL1 or confirmed specific release timing for V4. The company's track record of achieving state of the art performance at dramatically reduced computational costs continues to challenge assumptions about the resources required for frontier AI development.
Published January 21, 2026 at 11:10am