Is V4 powerful? From an engineering optimization perspective, the answer is undoubtedly yes. In the past, everyone believed in the “scaling law’s aesthetic of violence”—that is, improving model performance by stacking more high-quality computing power and larger parameter scales. But V4 takes a completely different path; it defines a “model training restraint aesthetic”:

It does not rely on crazy stacking of compute and parameters, but instead through a series of combined optimizations and reconstructions:

Attention mechanism (enabling the model to “focus on key points,” just like humans automatically pay attention to critical sentences when reading long articles)

MoE architecture (Mixture of Experts, which can be understood as “different experts responsible for different types of problems, activating only a few experts each time, saving time and effort”)

Post-training (refining the model after initial training through targeted reinforcement)

Inference system engineering (optimizing the efficiency of various stages during actual operation)

The result of this approach is that V4-Pro’s required compute power for processing a long context of one million tokens (roughly hundreds of thousands of words) has been reduced to 27% of the previous generation V3.2, and the KV cache used for temporarily storing dialogue context (think of it as the model’s “notepad” during chat) has been compressed to 10% of the original.

However, engineering is just engineering, and rankings are just rankings.

When evaluating a model, we don’t want to stay only on paper parameters, but to discuss V4’s value in real deployment, development, and investment scenarios. To this end, we invited nearly 10 developers, application entrepreneurs, and investors to experience and test for about three days.

A counterintuitive conclusion first: DeepSeek’s impact on application layers may be greater than on the model layer.

Amidst admiration for engineering optimization, as DeepSeek frankly states in its V4 technical report: the development trajectory lags behind cutting-edge closed-source models by about 3 to 6 months—V4’s current results are like making a deal with the devil: extending the capabilities of reasoning and agents (intelligent systems) at the cost of sacrificing some accuracy.

Closed-source model vendors can breathe a temporary sigh of relief. For the stable and precise business world, V4 is clearly not a directly deployable model.

Li Bojie, Chief Scientist of Pine AI, and Chillin, a founder of a leading Coding Agent startup, both told us directly that tool invocation stability and hallucination rate must be addressed at the harness layer (the “reins” and “seat belts” for guiding AI behavior and reducing errors). The deployment of V4 cannot do without scaffolding.

But the iteration direction of intelligent brains often influences the ecosystem of downstream applications. AI application startups will face more stringent tests from both technology and capital.

“The performance of the core model is still rapidly iterating”—this industry consensus also means that applications can be overturned by models at any time. An investor from a dual-currency fund cited many “yesterday’s news” cases: “Workflow, Coding…”

Chen Weipeng, founder and CEO of Yongyue Intelligence, summarized: the future barrier for AI applications is to organize models, agents, product scenarios, and data feedback into a reliable, low-cost, scalable production system.

Highlights: not only long text and coding ability, but also high capability at low cost

Preface: core advantages—code and agent capabilities

In several key code and software engineering evaluations, V4-Pro demonstrated the highest level among current open-source models, nearly on par with top closed-source models. The core data are summarized as follows:

AI Illustration

🧑‍🏫 Huang Dongxu, Co-founder and CTO of PingCAP

I am migrating my Hermes workflow to DeepSeek V4. Previously, I used Claude Opus and GPT5.4 for agents, but I found that most daily tasks don’t require very high coding ability.

Daily office tasks mainly include: (a) email sorting; (b) article writing; © calendar management; (d) content summarization; (e) web browsing.

Now I’ve fully switched to DeepSeek V4. Its performance exceeds my expectations, possibly optimized for Chinese, with overall language ability more aligned with native Chinese speakers than Opus and GPT.

So my first conclusion is: if you are currently using some more expensive models as your daily assistant agent, you can confidently switch to DeepSeek V4 Pro.

Its capability is roughly at the level of Claude Sonnet 4.5 to 4.6, but the price is less than a quarter of top-tier models. Now I basically no longer worry about agent costs.

DeepSeek V4’s paper emphasizes a 1M context window, but I feel this isn’t very strong, since mainstream SOTA models now generally support at least 1M tokens, so it’s just catching up.

Its real advantages are:

Truly very low cost;
It is an open-source model.

I no longer worry about supply cuts from Anthropic or OpenAI, which could make some of my workflows unusable—such things have happened before. With DeepSeek V4, I feel more secure.

Next, look at coding ability. Because the testing time is still short, I haven’t used it to develop very complex large-scale systems yet.

But for scenarios involving several thousand lines of code, small applications, or integrating with external third-party systems (like accessing tools on Supabase or TiDB Cloud by reading documentation), my experience so far is that there are basically no major issues.

In the scale of several thousand to ten thousand lines, V4’s success rate with one-shot (providing examples and instructions at once, without extra debugging) is still quite high.

So if you’re just doing some simple websites or small applications, I believe DeepSeek’s coding ability is much stronger than the previous generation.

Because my current harness framework doesn’t involve very complex manual orchestration, but relies more on the model’s own collaborative ability (using Slock.ai).

In simple terms, there are two points:

It can collaborate with agents using other models;
It can complete some simple/targeted tasks.

Therefore, if there are some stronger models (like GPT5.5 level) guiding DeepSeek V4 Pro, and then it’s responsible for execution, I think this mode can greatly reduce the cost of harness engineering.

🧑‍🏫 Zhao Binqiang, Vice President of Zero One All Things Technology and Product Center

DeepSeek V4 is not “the most versatile,” but it is “the most trustworthy”—steadfast open-source commitment, complete technical report, extremely low inference cost, and full-stack domesticization make it the most cost-effective foundational model choice for ToB (enterprise-facing) scenarios.

What amazes me most about DeepSeek V4 are two things.

First, the underlying innovation in model architecture. Maintaining high-quality reasoning ability within a 1 million token context window is backed by innovations in the hybrid attention mechanism. This can be simply understood as: “coarse reading” to grasp the overall meaning, “fine reading” for precise understanding of details.

Especially in the area of context compression, the exploration is very advanced, and DeepSeek openly shares detailed techniques in its technical report. This transparency and open-source spirit are extremely valuable in the fiercely competitive large model industry.

Second, the full-stack adaptation of domestic computing power. DeepSeek has completed adaptation to Huawei Ascend 910B/950 chips, with meticulous work in quantization, sparsification mechanisms, and domain expert optimization.

This means that from chips to underlying software, to model training and inference, a fully domestic stack solution has taken a solid step forward. While not entirely free from reliance on NVIDIA’s ecosystem, it has found the right development direction. The difficulty and significance of this cannot be overstated.

🧑‍🏫 Li Bojie, Chief Scientist of Pine AI

The most impressive is that DeepSeek has successfully integrated a series of architectural innovations—MoE, CSA+HCA hybrid attention, mHC, Muon, FP4QAT—on the current largest open-source scale of 1.6 trillion parameters.

It’s like combining many theoretically advanced but often failing small-scale techniques into a giant engine that runs stably. We’ve tested over 20 architectures, and the conclusion is almost always: “feasible at 7 billion parameters, but fails or even backfires at larger scales.”

Most other models’ architecture innovations are also stuck at this point. The ability to make multiple innovations work together at the largest scale indicates that DeepSeek’s underlying training technology is very deep. Just one of its techniques, “mHC,” amplifies the original signal from nearly 3,000 times in the 27B experiments to about 1.6 times, making training stable and controllable.

🧑‍🏫 Lenovo Group Vice President, Lenovo Venture Capital Chief Investment Officer, Senior Partner Song Chunyu

DeepSeek proves that “AI cost-effectiveness” can become a proactive structural advantage.

27% compute, only 10% GPU memory usage. Meanwhile, with 1.6 trillion total parameters but only activating 49 billion each time, efficiency is extremely high.

This structural cost reduction, combined with V4-Flash API’s low price of 1 yuan per million tokens, makes “mass-market ultra-long context” a new benchmark for AI applications.

🧑‍🏫 Chao Yue Intelligence Founder and CEO Chen Weipeng

What excites me most about DeepSeek V4 is not just the improvement of a single capability, but that it indicates domestic large models have moved from “catching up with foundational capabilities” to “participating in agent-based system competition.”

In the past, people cared more about whether models could answer, reason, or code; but today, what truly matters is whether models can reliably complete complex tasks, and whether they can be integrated into real product systems at low cost and high efficiency.

Regrettably: for actual deployment, V4 still lacks some “scaffolding”

Preface: Relative disadvantages—factual knowledge and extreme complex reasoning

DeepSeek’s official and evaluation platforms point out several obvious weaknesses of V4-Pro. To be more intuitive, we’ve summarized key weak points in the table below:

AI Illustration.

🧑‍🏫 Li Bojie, Chief Scientist of Pine AI

I mainly use it for coding and agentic tasks. In these types of work:

V4-Pro’s tool invocation ability and general world knowledge roughly match the next-tier versions of cutting-edge models (roughly comparable to Claude 4.6 Sonnet);

But tool invocation stability and hallucination rate are still major flaws—these must be addressed at the Agent Harness layer (like adding verification, automatic retries after failure, grounding with external knowledge bases, and strict, clear tool usage protocols). Otherwise, as task chains lengthen, errors will amplify;

Once these two issues are fixed at the harness layer, the overall reasoning cost can be several times lower than top-tier models. That’s the real leverage.

Another point is: V4-Flash, as a vertical fine-tuning “sweet spot,” is very good. What is vertical fine-tuning? It’s using domain-specific data to “retrain” a general model, making it an industry expert.

Training a super-large model of 1.6 trillion parameters (via supervised fine-tuning or RL) is prohibitively expensive for most companies, while models of 200-300 billion parameters are the main size for post-training. We previously did post-training on Qianwen 235B (235 billion parameters), but its effect was weaker than the same size V4-Flash.

V4-Flash’s performance has caught up with the previous generation trillion-scale open-source models, surpassing over 600B of DeepSeek V3.2 and older Kimi. It will become the preferred base for industry-specific fine-tuning.

🧑‍🏫 Chillin, founder of Coding Agent

Our internal evaluation concludes: in coding agent scenarios, DeepSeek V4 is at the level of Claude from over a year ago.

The issues may lie in two areas: parameter scale and data. DeepSeek still has a significant gap compared to Anthropic.

For real deployment, V4 still needs some special scaffolding, such as SWE-Agent (software engineering intelligent agent), OpenHands (an open-source coding agent), Claude Code, OpenClaw. These require additional configuration by developers.

🧑‍🏫 Chen Weipeng, founder and CEO of Yongyue Intelligence

Based on actual use of Loopit (Yongyue’s AI interactive content product), mainly in coding scenarios, it’s clear that DeepSeek V4 still has a gap compared to top foreign closed-source models in the stability and task completion rate of complex long-term tasks.

The ability gap among top domestic models is narrowing. This indicates that model competition is entering a new stage: in the agent era, whether models can understand long contexts, adapt to complex frameworks, and reliably complete long-term tasks at acceptable costs and speeds will become equally important.

The real difference is not just the model itself, but the overall system formed by the model, post-training, agent framework, evaluation system, and engineering efficiency.

🧑‍🏫 Lenovo Group Vice President, Lenovo Venture Capital Chief Investment Officer, Senior Partner Song Chunyu

V4’s release did not include a native multimodal version (i.e., models capable of processing text, images, sounds simultaneously), which is somewhat regrettable in the current market environment.

But combined with its comprehensive embrace of domestic computing power, this is likely a strategic choice to focus resources on solving the core compute infrastructure problem first.

🧑‍🏫 Zero One All Things Technology and Product Center Vice President Zhao Binqiang

Calling it “below expectations” might be a bit nitpicking.

But from a ToC (consumer-oriented) perspective, productization still isn’t enough—Flash version’s capabilities in complex tasks like creation and programming are somewhat lacking; Pro version, while close to top-tier closed-source models, requires high initial compute power and has a high entry barrier.

Impact: AI is not simply getting cheaper

🧑‍🏫 Chen Weipeng, founder and CEO of Yongyue Intelligence

An important trend is that AI is not simply becoming cheaper.

The costs of calling upon the world’s most flagship models are actually rising, because they handle more complex, longer-context, higher-value tasks. The models that are truly getting cheaper quickly are mid-tier, open-source, and self-deployable models.

Therefore, future application companies will not only ask “which model is the strongest,” but will need to build a model scheduling system: which tasks must use the strongest model, which can use high cost-performance models, and where capabilities can be supplemented by agent frameworks and engineering systems.

DeepSeek V4’s significance lies in enriching the model supply layer.

For enterprises, it’s not just about replacing a certain overseas model, but enabling more flexible multi-model orchestration, self-deployment, and cost optimization.

The future barrier for AI applications will not be simply calling a model, but organizing models, agents, product scenarios, and data feedback into a reliable, low-cost, scalable production system.

For Loopit, this trend is crucial. We focus on AI interactive content; model capabilities determine creative limits, while cost and speed determine whether creation can scale.

Only when different levels of models are sufficiently usable and can be effectively orchestrated, can the vast number of user ideas be generated, interacted with, and propagated in real time. The progress of DeepSeek V4 will accelerate this process.

🧑‍🏫 Li Bojie, Chief Scientist of Pine AI

In the vertical fine-tuning market, models like Qianwen, Llama, etc., in the 200-300B range, are being systematically replaced by V4-Flash.

All teams doing post-training at this scale will reevaluate; V4-Flash’s effect surpasses that of models at the same size, with full-day-one inference framework adaptation (SGLang/vLLM/TileLang). Within six months, it will become the default starting point for domestic open-source vertical models.

Huawei Ascend 950 SuperNode inference ecosystem has officially started, challenging NVIDIA’s chip premium.

This is the first fully operational “domestic chip + domestic top open-source model” solution (NVIDIA/AMD did not get early adaptation of V4). After large-scale shipments of 950 in the second half of the year, a wave of pure domestic inference replacements in agent long-context scenarios is expected.

This indirectly affects NVIDIA’s valuation and premium in the Chinese market—no sales collapse, but bargaining power is being re-priced.

The overall cost of using agents capable of handling complex long-term tasks has dropped significantly.

V4-Pro’s input ( cache miss ) costs $1.74 per input, $3.48 per output + 1M context high-efficiency KV + MegaMoE has already reduced per-token cost to 1/6–1/7 of the frontier models;

As long as the industry addresses tool invocation stability and hallucination rate at the harness layer (validators, external grounding, strict schemas, consistent voting), those multi-step research, long-term code agents, and deep search applications that were previously uneconomical due to cost will move from demos to real business in the second half of this year. The inflection point for agent economics is here.

Moreover, closed-source top-tier vendors will not lower prices because of this—they still lead significantly, and V4 does not threaten their pricing.

🧑‍🏫 Zero One All Things Technology and Product Center Vice President Zhao Binqiang

The core proposition for ToB AI applications is: achieve full-cycle cost control while ensuring effectiveness. The appearance of DeepSeek V4 offers a highly competitive solution.

Flash covers simple tasks, Pro covers high-complexity scenarios, and overall costs are significantly lower than mainstream closed-source solutions, enabling Zero One All Things to greatly improve cost-performance in delivery.

More importantly, DeepSeek’s open-source stance is firm and unwavering—no sudden move to close source, avoiding investments going to waste. This steadfast open-source attitude provides valuable certainty for enterprise-level technology choices.

Zero One All Things has already fully launched product evaluation and capability verification based on DeepSeek V4, focusing on performance in production scheduling, intelligent office, investment management, and other core enterprise scenarios. Once verified, they will consider replacing existing models, allowing more industry clients to access top domestic large models.

After V4’s release, I believe three main industry changes will occur:

Domestic full-stack solutions enter the development track, turning domestic substitution from “dream” to “reality”

DeepSeek’s successful adaptation to Huawei Ascend signifies a substantial step forward in the “chip + framework + model + application” full-stack domesticization of AI.

For compliant government and enterprise clients, this is a necessity. The process of domestic substitution in the ToB market will accelerate significantly.

Open-source large models force closed-source vendors to lower prices, reducing the “vampire” effect on AI application businesses

DeepSeek achieves near top-tier closed-source model performance at a fraction of the cost, setting an example that will further boost the overall performance of open-source models.

This will also pressure companies like Anthropic and OpenAI to reconsider their high pricing strategies. Industry profit centers will shift from base models to deep industry applications, which is very beneficial for long-term AI development.

Open-source models ≠ enterprise applications; harness capabilities become the new dividing line

Open source lowers the barrier to entry for base models, but harness (the engineering layer) determines deployment success. From high-quality open-source models to stable, reliable enterprise products, there’s a layer in between—covering hallucination elimination, instruction following, error checking, professional injection, and other engineering capabilities.

Different industries have different needs; no single harness is universal. This is precisely Zero One All Things’ core advantage: through automatic evaluation, feedback, improvement, and professional injection, they can quickly build industry-specific harness systems, making large models truly usable in business.

🧑‍🏫 Lenovo Group Vice President, Lenovo Venture Capital Chief Investment Officer, Senior Partner Song Chunyu

First, a million-token context becomes standard at the application layer, driving the explosion of agents: V4 will bring ultra-long context capabilities down to accessible infrastructure.

Second, industry competition shifts from “model competition” to “application and data competition”: as top open-source models approach the performance of closed-source ones and costs drop sharply, the model itself will no longer be a scarcity barrier. Future investments and competition will focus on who can leverage these foundational models to build data and application closed loops in high-value verticals like healthcare, finance, and law, forming a business moat.

Third, the domestic compute power industry chain will see huge investment opportunities: V4’s success proves that large models can also shine on domestic chips. This will generate a strong demand for domestic compute power, fueling investments across the entire industry chain—from chip design and servers to cloud services.

We judge that “this year’s domestic compute power is equivalent to last year’s overseas compute power,” and the industry trend and capital market effects will be especially strong.

Resources will be focused on projects that can quickly commercialize, land in industries, and form product barriers, while long-term investments in underlying architecture and compute infrastructure will continue.

🧑‍🏫 A dual-currency fund investor

My wish this year is: that the core model portfolio (Portfio) of investments will successfully go public.

Once DeepSeek starts financing, it will attract a large amount of first-market (especially state-owned) capital. For the remaining few core model companies that haven’t IPO’d, continuous rolling financing is unsustainable.

I also hold a somewhat pessimistic view: application-layer financing will be difficult this year.

Core model capabilities are still rapidly iterating, meaning many applications will be overturned. Just like last year’s hot Coding and Workflow, they are no longer mentioned much in the primary market this year.

🧑‍🏫 Coding Agent founder Chillin

Open source is a good thing, and DeepSeek V4 can further promote communication and optimization. But the gap in timing is large, which makes it somewhat frustrating;

V4 will push model vendors to face scale and data issues more directly, but these are extremely difficult problems—mainly capital-related.

It also further proves the limits of Scaling Law. Performance leaps from engineering are limited, forcing everyone to seek more fundamental solutions. The road is long and arduous.

Bonus: A practical guide to DeepSeek V4

What is it suitable for?

Coding and learning: If you are a beginner in programming or need to write personal scripts, DeepSeek V4 is one of the top choices. It reliably understands context, generates high-quality code, and excels at debugging.

Chinese and CJK content creation: Whether writing articles, polishing copy, or translating, V4 performs exceptionally well in Chinese, Japanese, and Korean environments.

Reading and analyzing ultra-long texts: V4 natively supports up to 1 million token context windows. You can feed it entire books, lengthy reports, or complete codebases at once, and it will help summarize or extract key information.

What is it not suitable for?

Searching and verifying objective facts: V4 is a “reasoning model,” not an “encyclopedia.” Its recall of factual knowledge (like historical details or specific entity info) is weak, and hallucinations are common. Especially the V4-Flash version scores only 34.1% on factual QA tests. Do not use it as a search engine; verify facts with other AI tools that have search functions or by manual checking.

Processing images or document layout: DeepSeek V4 is a pure text model, with no vision input or output support. For analyzing charts or images, use other multimodal models (like GPT-5.4 Mini).

Pure English advanced creative writing: While it can write in English, its output can sometimes be stiff. If you need highly natural, idiomatic, or creative pure English content, consider other mainstream Western models.

Other notes:

Allow ample thinking space: If you’re using the Pro version with explicit chain-of-thought (CoT) prompting, when facing difficult problems, encourage it to “think a few more steps” or enable “Think Max” mode. The deeper its reasoning, the more accurate the answer.

Tolerate occasional verbosity: evaluations show V4 is relatively “verbose” and slower in output. If you prefer brief answers, explicitly request “answer in one sentence” or “be as concise as possible.”

Welcome to exchange ideas!

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
388.5K Popularity
#
#FedHoldsRateButDividesDeepen
13.12K Popularity
#
DailyPolymarketHotspot
715.72K Popularity
#
BitcoinSpotVolumeNewLow
162.66M Popularity
#
OilBreaks110
857.32K Popularity

Sitemap

Don't rush to All-in DeepSeek V4, first check out these 10 practitioners' honest words.

Trending Topics

WCTCTradingKingPK

#FedHoldsRateButDividesDeepen

DailyPolymarketHotspot

BitcoinSpotVolumeNewLow

OilBreaks110

Pin