1. What Hermes 4 is — architecture, parameters, and release details 2. Benchmark results: where Hermes 4 outperforms ChatGPT and where it does not 3. No content restrictions: what this means for developers and end users 4. Who is Nous Research and what is the Hermes lineage 5. The open-source AI landscape: Meta, DeepSeek, Mistral, and Hermes 6. The safety versus accessibility debate around unrestricted models 7. Developer use cases and fine-tuning potential 8. Competitive pressure on proprietary model pricing 9. Where the open-source AI movement is heading 10. Frequently asked questions ---

Nous Research Releases Hermes 4: Open-Source Model Claims t…

TL;DR: Nous Research has released Hermes 4, an open-weight language model that benchmarks above ChatGPT (GPT-4o-class) on several key evaluations — while shipping without the content restrictions that govern commercial models. Available as a free download on Hugging Face, Hermes 4 runs locally on consumer hardware and is already drawing attention from developers who want capable AI without API gatekeeping, usage policies, or per-token costs.

Nous Research just shipped the model that the open-source community has been waiting for — one that does not beg for permission. Hermes 4 is unrestricted, locally runnable, and by several benchmark measures, more capable than the ChatGPT experience that hundreds of millions of users treat as the default definition of AI. That combination is not an accident. It is a deliberate design philosophy that has made the Hermes series the most downloaded fine-tuned model family in the open-weight ecosystem — and it is about to make the proprietary AI providers very uncomfortable.

What you will learn

What Hermes 4 is — architecture, parameters, and release details
Benchmark results: where Hermes 4 outperforms ChatGPT and where it does not
No content restrictions: what this means for developers and end users
Who is Nous Research and what is the Hermes lineage
The open-source AI landscape: Meta, DeepSeek, Mistral, and Hermes
The safety versus accessibility debate around unrestricted models
Developer use cases and fine-tuning potential
Competitive pressure on proprietary model pricing
Where the open-source AI movement is heading
Frequently asked questions

What Hermes 4 is

Hermes 4 is an open-weight large language model released by Nous Research in March 2026. It is built on a base architecture consistent with the Llama 3.x generation of models and has been substantially fine-tuned using Nous Research's proprietary training data and reinforcement learning from human feedback (RLHF) methodology — the same approach that produced the previous Hermes 3 series.

The model is available in multiple parameter sizes to serve different hardware configurations. The flagship variant is a 70-billion parameter model that requires a high-end consumer GPU or multi-GPU setup to run at full precision, with quantized versions available for systems with 24GB of VRAM or less. Smaller 8B and 14B variants allow Hermes 4 to run on a single modern GPU or even on Apple Silicon MacBook Pro hardware through tools like Ollama and LM Studio.

What distinguishes Hermes 4 from a typical open-weight release is the combination of factors Nous Research has optimized for simultaneously: instruction-following accuracy, long-context performance, tool-use and function-calling capability, and the absence of refusal behaviors. Each is addressed in the fine-tuning pipeline; none is treated as secondary.

The release is on Hugging Face under NousResearch, with weights, quantized variants (GGUF and AWQ), and system prompt documentation. No API, no account, no per-token cost.

Nous Research has been transparent that Hermes 4's training used a large synthetic dataset generated through "Distilabel" — a pipeline that creates high-quality instruction-following examples at scale — combined with curated open datasets and targeted RLHF tuning.

Benchmark results

Nous Research published benchmark comparisons alongside the Hermes 4 release, comparing the 70B model against GPT-4o, Claude 3.5 Sonnet, and other frontier and open-weight models. As always with self-reported benchmarks, directional signals matter more than precise rankings, and independent replication is required before treating any specific number as definitive.

The evaluations where Hermes 4 70B claims to outperform GPT-4o include instruction-following benchmarks, reasoning tasks at the 70B parameter scale, and the IFEval (Instruction Following Evaluation) suite, where Hermes 4 shows particularly strong performance in following complex, multi-constraint instructions without deviation or unnecessary refusals.

Benchmark	Hermes 4 70B	GPT-4o	Notes
IFEval	~88.1%	~85.1%	Instruction following accuracy
MMLU	~82.4%	~85.7%	General knowledge
HumanEval	~76.2%	~90.2%	Code generation
MATH	~71.3%	~76.6%	Mathematical reasoning
MT-Bench	~8.9 / 10	~9.0 / 10	Multi-turn conversation
BBH (Big Bench Hard)	~79.4%	~83.1%	Reasoning tasks

The headline result — that Hermes 4 beats ChatGPT — is accurate but requires context. Hermes 4 outperforms GPT-4o on instruction-following tasks specifically, which is a meaningful benchmark for developers building applications that need precise, format-correct outputs from an AI system. However, GPT-4o leads Hermes 4 on code generation, mathematical reasoning, and general knowledge tasks.

The comparison against GPT-4o rather than GPT-5.x is also important to note. OpenAI's ChatGPT product in March 2026 defaults to GPT-5.3 Instant or GPT-5.4 for paid subscribers, not GPT-4o. When compared against these newer OpenAI models, Hermes 4 does not hold an equivalent edge. The "outperforms ChatGPT" claim holds for specific benchmark categories against the GPT-4o baseline — not against the current frontier.

For the open-source category specifically, Hermes 4 represents a meaningful step. Against the Llama 3.3 70B base model it is likely built on, Hermes 4 shows substantial improvements from fine-tuning. Against Mistral Large and comparable open-weight models, Hermes 4 is competitive across most categories and leads on the instruction-following dimension that matters most for application developers.

Third-party evaluations from the community — typically appearing within days on LM Arena and the Hugging Face Open LLM Leaderboard — will provide the clearest independent signal on where Hermes 4 actually sits.

No content restrictions

The feature of Hermes 4 that generates the most discussion — and the most concern — is its release without the content filtering and refusal behaviors that govern commercial models like ChatGPT, Claude, and Gemini.

In practice, this means Hermes 4 will respond to queries that commercial models refuse. It will write content involving violence, explicit material, morally complex scenarios, or legally sensitive information without inserting disclaimers, redirecting to mental health resources, or declining based on harm assessments. For the commercial AI ecosystem, these refusal behaviors are not bugs — they are deliberate design choices made by Anthropic, OpenAI, and Google based on legal liability concerns, brand reputation management, and genuine safety considerations.

Nous Research's position is that these restrictions impose real costs on legitimate users. Researchers studying extremist rhetoric need to model that rhetoric. Security professionals need AI that will engage with vulnerability details. Novelists need models that will write morally complex fiction without sanitizing it. Medical professionals need frank discussions of drug interactions, dosing thresholds, and risky procedures without a model treating them as potential bad actors.

The Hermes series has operated from this philosophical position since its first release. It is not new to Hermes 4. What is new is that the model is powerful enough to make the tradeoff explicit in a way that earlier, less capable versions could not. Hermes 4 at 70B parameters is a genuinely capable model — competitive with frontier commercial offerings on core benchmarks — and it operates without guardrails. That combination raises the stakes of the access question considerably.

Nous Research does not distribute Hermes 4 with intent to enable harm. What the release does is shift the decision about how to use the model entirely to the person running it. Operators can implement their own content policies at the application layer — the point is that guardrails are the operator's responsibility rather than baked into the model weights.

Nous Research background

Nous Research is a small AI research organization with an outsized footprint in the open-source model ecosystem. Founded in 2022, the organization operates with a lean team and a clear focus: producing the best fine-tuned open-weight models available, distributed freely to the community.

The Hermes lineage began with Hermes 1, released in late 2023 as a fine-tune of the original Llama 2 model. It immediately distinguished itself from other community fine-tunes by its instruction-following quality — the model was noticeably better at following complex, multi-part instructions without deviation than comparable open-source alternatives. The AI developer community adopted it quickly, and Hermes 1 became one of the most downloaded models in the Hugging Face ecosystem within weeks of release.

Hermes 2 (2024), fine-tuned on Mistral 7B and Llama 3 variants, added function-calling and tool-use support. Hermes 3 (2025) on Llama 3.1 added long-context capability. Each version built on community reception and developer feedback.

Nous Research's funding model is deliberately minimal — community support, grants, and selective commercial relationships rather than venture capital. This keeps the organization independent of the pressure to commercialize or restrict model access that comes with institutional funding.

The organization operates largely in public, sharing training details, dataset compositions, and model architecture decisions through blog posts and community channels rather than behind embargos. It is a model of open AI development that stands in deliberate contrast to the secrecy that now characterizes the major commercial labs.

The open-source AI landscape

Hermes 4 arrives at a moment when the open-source AI ecosystem is demonstrably competitive with proprietary frontier models for the first time in the technology's short history.

The shift began with Meta releasing Llama model weights in 2023 — initially dismissed as too weak to matter. Two years later, Llama 3 has become the foundation for thousands of fine-tuned variants including Hermes 4.

DeepSeek R1, released in early 2025 with full weights and a permissive license, matched or exceeded OpenAI's o1 on several mathematical and scientific benchmarks at a fraction of the training cost — proving that open-weight models could reach frontier reasoning capability, not just competitive general performance.

Mistral AI has consistently pushed the boundary of what small parameter counts can achieve. The Mixtral series showed that mixture-of-experts architectures could dramatically improve efficiency, and Mistral's Apache 2.0 licensed releases make them freely usable in commercial applications. Alibaba's Qwen series adds multilingual capability from a non-Western lab — a signal that the open-weight ecosystem is genuinely global in a way the commercial frontier model market is not.

Against this backdrop, Hermes 4 represents the fine-tuning layer of the open-source stack coming of age. Expert fine-tuning now produces results genuinely competitive with commercial frontier offerings in specific benchmark categories — demonstrating that a small, independent team with the right methodology can close the gap that commercial labs assumed would remain their structural advantage.

Safety versus accessibility

The most contested dimension of Hermes 4's release is the explicit choice to ship without content restrictions — and the debate it reignites about whether AI safety and open access are fundamentally in tension.

The commercial AI labs have settled on a position: powerful models require content filtering, refusal behaviors, and usage policies for responsible deployment — commitments formalized in documents like Anthropic's Responsible Scaling Policy.

The counter-position, which Nous Research represents, is that centralized content control serves commercial providers' interests — protecting them from liability and regulatory scrutiny — at the cost of paternalistic restrictions on users with legitimate purposes. Researchers studying extremist rhetoric, security professionals analyzing vulnerabilities, medical professionals needing frank clinical discussions: all are hindered by commercial refusals.

Both positions have merit. Commercial AI models do over-refuse at rates that impose real costs on legitimate users. At the same time, unrestricted models do get used to generate harmful content — harassment campaigns, non-consensual imagery, and social engineering scripts are documented misuses.

The honest framing is not "safe versus unsafe" but "who controls the safety decisions." Hermes 4 puts that decision entirely with the person running the model. Nous Research's implicit answer is that the balance of legitimate versus harmful use cases favors release — and that restricting powerful models does not stop determined bad actors, it just adds friction for everyone else. This central tension in open-source AI development will not be resolved by Hermes 4; it will only intensify as open-weight capability grows.

Developer use cases

For the developer community that has driven Hermes series adoption, Hermes 4 opens up a set of use cases that were either unavailable or prohibitively expensive with commercial APIs.

Local inference at zero marginal cost. No per-token fee. For high-volume applications — consumer chatbots, internal tools, research pipelines — the cost difference between local inference and commercial API pricing can be an order of magnitude.

Privacy-sensitive applications. Data processed locally never leaves the premises. This is directly relevant for healthcare (HIPAA) and legal (attorney-client privilege) applications where transmitting data to a commercial API carries regulatory risk.

Fine-tuning flexibility. Weights are available for further fine-tuning. A legal tech company can adapt on case law. A medical provider on clinical notes. A customer service platform on proprietary interaction history. None of this is possible with commercial models where weights are not released.

Instruction-following precision. For applications that require exactly-formatted outputs — JSON schemas, structured reports, specific template adherence — Hermes 4's IFEval benchmark performance is directly relevant. Developers building applications where the model's output is consumed programmatically need a model that reliably follows formatting instructions without inserting extraneous commentary or deviating from the specified format.

Research and red-teaming. Security researchers and AI safety teams that need to study model behavior — including harmful outputs — need access to models that will produce those outputs in a controlled environment. Commercial models' refusal behaviors block this research. Hermes 4's unrestricted nature makes it the standard tool for AI safety research that requires studying failure modes.

Competitive pressure on proprietary pricing

When GPT-3.5 Turbo was the most capable widely-accessible model in 2022, the open-source performance gap was large enough that paying for API access was rational even for cost-sensitive developers. That gap has narrowed substantially. After Nous Research's fine-tuning, Hermes 4 is competitive with GPT-4o on specific benchmark categories — and it runs locally at zero marginal cost.

The business model of charging $2.50–$15.00 per million input tokens depends on maintaining a meaningful capability premium over free alternatives. As that premium narrows, the pricing justification shifts toward reliability, support, latency, and safety guarantees rather than raw capability. For enterprise customers — who care about SLAs, enterprise agreements, and indemnification — OpenAI, Anthropic, and Google have structural advantages Hermes 4 does not challenge.

For developers, startups, and researchers, the calculus is different. These users have less concern about enterprise SLAs and more sensitivity to cost and flexibility. Hermes 4 targets exactly this segment, and each capability improvement narrows the justification for paying commercial API rates. OpenAI's pricing has already dropped substantially from 2023 to 2026 in response to open-source competitive pressure — and Hermes 4 accelerates that dynamic further.

Where open-source AI is heading

Open-weight models will continue closing the capability gap with proprietary frontier models. Base model releases from Meta and DeepSeek provide increasingly capable foundations; fine-tuning techniques like DPO and synthetic data generation become more efficient; the community of researchers grows as tools become accessible.

The question is not whether open-weight models will match commercial frontier capability in most categories — they will. The question is what happens when that parity is reached.

One scenario is commoditization: GPT-5.x class capability freely available in open-weight form pushes commercial providers toward the enterprise segment — SLAs, compliance, and integration — where open-source cannot easily compete. The competitive moat becomes the next training run, not the current one.

Another scenario is divergence: commercial providers maintain advantages through training runs only large organizations can afford. This requires that the relationship between compute and capability stays steep — a relationship DeepSeek's low-cost training runs have shown is not guaranteed.

The AI ecosystem in 2026 is more competitive, more diverse, and more accessible than two years ago. The tension Hermes 4 sharpens — between capability access and safety governance — will be resolved not in model release decisions but in the regulatory frameworks and community standards that are still being built.

Frequently asked questions

Where can I download Hermes 4?

Hermes 4 weights are available on Hugging Face under the NousResearch organization. Multiple quantized variants (GGUF, AWQ) are available for different hardware configurations. There is no registration or approval process required — the weights download like any other open-source software package.

What hardware do I need to run Hermes 4?

The 70B flagship model requires significant hardware — typically 2x RTX 3090 or better for full precision inference, or a single 24GB GPU for Q4 quantized variants. The 8B variant runs on a single modern consumer GPU (RTX 3080 class) or Apple Silicon M2/M3 hardware. Tools like Ollama, LM Studio, and llama.cpp handle quantization and hardware optimization automatically and are the recommended starting point for most users.

Is Hermes 4 actually better than ChatGPT?

On instruction-following benchmarks specifically, Hermes 4 70B outperforms GPT-4o. On code generation, mathematical reasoning, and general knowledge, GPT-4o and current-generation ChatGPT (GPT-5.x) maintain leads. The "better than ChatGPT" claim is accurate for specific benchmark categories against the GPT-4o baseline — not across-the-board capability or against the current frontier. For most everyday use cases, the practical difference is smaller than the benchmark numbers suggest in either direction.

Does Hermes 4 have content restrictions?

No. Hermes 4 is released without the content filtering and refusal behaviors of commercial models. It will respond to queries that ChatGPT, Claude, and Gemini decline. Nous Research positions this as respecting user autonomy and enabling legitimate use cases. Critics argue it enables harmful uses. Both observations are accurate; the tradeoff is a philosophical and practical choice that each person deploying the model makes for their specific context. Operators can implement application-layer content policies on top of Hermes 4 if their use case requires them.

Can Hermes 4 be fine-tuned further?

Yes. The model weights are available under a permissive license that allows further fine-tuning. Developers can adapt Hermes 4 for domain-specific applications using standard fine-tuning toolkits like Axolotl, Unsloth, or the Hugging Face TRL library. This is one of the primary reasons organizations choose open-weight models for production deployments.

How does Hermes 4 compare to Mistral and DeepSeek models?

Hermes 4 is competitive with Mistral Large and DeepSeek comparable-parameter models across most benchmark categories, with a specific edge on instruction-following tasks. DeepSeek's R1 series maintains advantages in mathematical and scientific reasoning. Mistral's Mixtral series offers strong performance at lower parameter counts through mixture-of-experts architecture. All three represent the maturation of the open-weight model ecosystem into genuine frontier-competitive capability.

What is Nous Research's business model?

Nous Research operates as a research organization supported by community contributions, grants, and selective commercial relationships — not venture capital. This structure removes the commercial incentive to restrict access that shapes decisions at VC-backed AI companies, and keeps the organization aligned with the open-source community that improves each successive Hermes release.

For context on the current frontier commercial model landscape that Hermes 4 is benchmarked against, see our coverage of GPT-5.4's launch and benchmark results and the DeepSeek competitive pressure on OpenAI's claims.

Let's Build Something Together

Nous Research Releases Hermes 4: Open-Source Model Claims to Outperform ChatGPT

Weekly Newsletter