Open source video AI: LTX 2.3 and Helios march 2026
Open source video AI LTX Helios march models hit 83.1 VBench. Run 10 hours of video for $0.19 vs $1,440 on Sora. Here's what changed.
Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.
Get in TouchAI, startups & growth insights. No spam.
TL;DR: Four open source video AI models shipped in 48 hours ending March 10, 2026: LTX-Video 2.3 by Lightricks (13B parameters, 83.1 VBench score, 8-second 720p generation on an RTX 4090), Helios (81.7 VBench, native 1080p), HunyuanVideo v2 (Tencent), and CogVideoX 5B (ZhipuAI). Sora sits at an estimated 87-89 on VBench and charges $0.04 per second. LTX 2.3 runs locally for roughly $0.19 per 10 hours of video. That 7,500x cost gap is the real story.
When OpenAI showed Sora in February 2024, the creative industry split into two camps. One half was amazed. The other half was scared. Both halves assumed the same thing: OpenAI had a multi-year lead that open source couldn't close.
Twenty-five months later, that assumption is wrong.
On March 9, 2026, Lightricks published LTX-Video 2.3 on HuggingFace. The next day, an anonymous lab pushed Helios. Within the same 48-hour window, Tencent dropped HunyuanVideo v2, and ZhipuAI released CogVideoX 5B. Four complete model releases, each with publicly available weights, inference code, and ComfyUI support.
The open source video AI field went from "promising" to "production-ready" in two days.
This is not the first time a creative AI market turned this way. Stable Diffusion arrived in August 2022 as a capable open source image model. What followed wasn't a gradual adoption curve. It was a wave: Automatic1111, ControlNet, IP-Adapter, LoRA fine-tuning, community checkpoints, and a tooling community so vast and fast-moving that no closed-source company has matched it. Midjourney survived by building for the art community with a Discord-first product. Adobe adapted by acquiring Firefly and building into existing workflows. Everyone who relied purely on generation quality as a moat got crushed.
The same cycle is starting now, for video. LTX 2.3 and Helios are the Stable Diffusion moment for video AI, and the inflection point is happening roughly 18 months faster than comparable image AI events because the community is bigger, the infrastructure is more mature, and the hardware is more accessible.
The broader open source AI story this quarter extends beyond video. But video is where the cost differential is most dramatic, But video is where the cost differential is most dramatic, and where paid services will feel pressure first.
LTX-Video 2.3 is a Diffusion Transformer (DiT) architecture video model with 13 billion parameters, released by Lightricks on March 9, 2026 under a commercially permissive license.
Lightricks is not a new AI lab. They built Facetune and LightLeap, products used by hundreds of millions of people for image and video editing on mobile. Their open source video work is a deliberate strategic play: get professional creators dependent on open LTX models before those creators build pipelines around Sora or Runway.
The 2.3 release ships with two variants: the full 13B model for maximum quality and a 2B distilled variant for creators with less VRAM who can accept a small quality tradeoff.
LTX 2.3 treats video as a unified sequence of spatial and temporal tokens rather than processing each frame independently. This is the same architectural paradigm Sora uses, and it's why temporal consistency on LTX 2.3 is substantially better than earlier open source video models that handled frames separately.
The practical effect: objects move coherently between frames. A person's face doesn't drift. A car's color stays consistent. Motion follows physics at a level previous open source models couldn't sustain across 5-second clips.
The model supports video up to 60 seconds in length on capable hardware, though the practical ceiling for most RTX 4090 setups is 15-20 seconds before VRAM becomes a constraint. For 5-second clips at 720p, generation time is approximately 8 seconds. The 2B distilled variant does the same in roughly 3 seconds.
Sora generates video on OpenAI's cloud infrastructure, with wait times that vary by server load and subscription tier. You're at the mercy of queue position. LTX 2.3 runs locally, iteratively, without queue times.
That iteration speed matters more than most people initially realize. Professional video production isn't "prompt once, use result." It's prompt, review, adjust, re-prompt, review, adjust, repeat. On Sora, each iteration loop costs time and money. On LTX 2.3 locally, each loop costs 8 seconds of compute. The creative workflow works differently.
LTX 2.3 went viral within hours of release. The model quality matters, but the ComfyUI integration is what drove immediate adoption. Lightricks maintains an official ComfyUI-LTXVideo repository with pre-built workflows covering:
Within 24 hours of release, the ComfyUI community had published dozens of derivative workflows. Portrait animation. Product visualization. Architectural walkthrough. Cinematic B-roll generation. Existing Stable Diffusion creators could integrate video generation into workflows they already understand, using tools they already own, in minutes.
This is not a coincidence. This is Lightricks' distribution strategy. They know the Stable Diffusion community is the fastest-adopting creative AI community on earth, and they built their official integration to slot directly into that community. The result: LTX 2.3 had more active community workflows within a week than most commercial video platforms have accumulated in months.
Photorealistic output scores strongest on VBench dimensions. Talking head videos with consistent character identity. Short-form product demos. B-roll footage for documentary and journalism. Abstract visual content for music and social media. The model handles both photorealistic and stylized prompts, with stylized outputs sometimes benefiting from the community's fine-tuned LoRA checkpoints that began appearing within days of release.
Where it still shows strain: very long clips (15+ seconds at high motion), complex multi-person scenes with significant interaction, and physics scenarios requiring simulation-level accuracy. For these use cases, Sora's quality advantage remains real and visible.
Helios is the most intriguing release of the week. Partly because of what it can do, and partly because nobody knows who made it.
Helios is the first open source video AI model with native 1080p output, scoring 81.7 on VBench, released March 10, 2026 by an unidentified lab.
The model appeared on HuggingFace under an organization called "Helios Research" with no founding team information, no company website, and no disclosed affiliation. The technical model card is detailed enough to suggest genuine research depth, with architecture specifications and training methodology described at a level of rigor that matches published academic work. But the authors are completely anonymous.
Community reaction split predictably: admiration for the output quality, genuine curiosity about the motives behind the anonymity. The Reddit thread on r/StableDiffusion calling it "the model that doesn't want to be found" has over 2,000 upvotes. That framing likely accelerated organic sharing more than any announcement strategy could have.
Most open source video models were architected for 720p and retrofitted higher resolution as an optional high-quality extension. The results of that approach are visible in output quality: artifacts at edges, inconsistent sharpness, and spatial coherence that degrades when upscaled.
Helios appears to have been designed from the start for 1080p output. Its spatiotemporal patch embedding size is optimized for the 1080p pixel space rather than scaled up from 720p. For professional users who deliver broadcast or streaming footage, this closes the single most visible gap between open and closed video models. At 1080p native, Helios output doesn't require post-processing upscaling that introduces artifacts.
The 1080p capability doesn't come at the cost of temporal consistency. VBench dimension scores for motion smoothness and subject consistency remain comparable to the best 720p open models. The model appears to have been trained specifically to maintain coherence at higher resolution, not just to generate more pixels.
Three theories circulate in the community. First: a stealth lab preparing for a commercial reveal, using the open source release to build community before a product announcement. Second: an academic team avoiding IP complications that would arise from institutional affiliation, releasing the model pseudonymously to preserve publication options. Third: a well-resourced individual or small team building genuinely for the community with no commercial intent.
None of these can be confirmed or ruled out with current information. What the anonymity does confirm is that video AI capability has reached a point where a small, unannounced team can build a world-class model and release it without needing a press strategy, venture backing, or institutional affiliation. That fact alone has implications for how quickly the field will keep moving.
The other two releases in the 48-hour wave serve different needs and deserve distinct treatment.
Tencent's HunyuanVideo has been one of the strongest open source video models since its initial release in late 2024. The v2 update focuses specifically on temporal consistency, the dimension where AI video most visibly fails for most users.
The problem temporal consistency solves is fundamental: making objects, faces, and environments behave as if they exist in a continuous physical world across the duration of the clip. Early video AI models failed here in ways that were immediately obvious to any viewer. Objects would shift color. Faces would morph between frames. Backgrounds would ripple or change geometry without cause.
HunyuanVideo uses a 3D attention architecture that processes spatial and temporal relationships simultaneously, and the v2 update refines the motion planning component of that architecture. The model card notes improvements to long-range motion coherence, meaning objects maintain consistent behavior not just frame-to-frame but across the full clip duration.
For developers building production pipelines, HunyuanVideo v2 is currently the most mature option for clips requiring sustained physical coherence. The model sits at approximately 82.5 on VBench, placing it between LTX 2.3 and Helios in the open source leaderboard.
Tencent releasing this as open source follows their broader pattern with AI: the broader Tencent AI strategy has consistently included open source releases as a way to build community influence in markets where closed products face regulatory barriers.
At 5 billion parameters, CogVideoX 5B is the accessible entry point in this release wave. It runs on hardware configurations that 13B and larger models cannot: mid-tier GPUs with 16GB VRAM, cloud instances that would cost more per hour to run the larger models.
This accessibility matters. Most professional video AI use cases don't require 13B parameters to produce usable output. Short social media clips, simple product demonstrations, and basic promotional content all fall within CogVideoX 5B's quality range. For development teams prototyping video AI features who need fast iteration without expensive hardware, 5B at 79.8 VBench is a practical starting point.
The gap between CogVideoX 5B and LTX 2.3 (79.8 vs 83.1 VBench) is measurable but not dramatic for most consumer applications. For professional broadcast use, the difference matters. For Instagram Reels or TikTok content, it often doesn't.
VBench, developed by researchers at Nanyang Technological University, is the field's primary standardized benchmark for video generation quality. It measures generated video across 16 dimensions: subject consistency, background consistency, temporal flickering, motion smoothness, dynamic degree, aesthetic quality, imaging fidelity, and others that together capture what humans recognize as high-quality video.
Key finding: the best open source video AI models now match or exceed Runway Gen-3 Alpha on VBench, while Sora maintains a 4-6 point lead.
The full comparison as of March 2026:
| Model | VBench score | Max resolution | Open source | Pricing |
|---|---|---|---|---|
| Sora (OpenAI, estimated) | 87-89 | 1080p | ✗ | $0.04/sec API |
| Runway Gen-3 Alpha (estimated) | ~83.5 | 720p/1080p | ✗ | $12/mo + credits |
| LTX-Video 2.3 (Lightricks) | 83.1 | 720p | ✓ | Free (local) |
| HunyuanVideo v2 (Tencent) | ~82.5 | 720p | ✓ | Free (local) |
| Helios | 81.7 | 1080p native | ✓ | Free (local) |
| CogVideoX 5B (ZhipuAI) | ~79.8 | 720p | ✓ | Free (local) |
| Kling (Kuaishou) | ~85 | 4K | ✗ | Credit-based |
The counterintuitive result: LTX-Video 2.3 at 83.1 matches or exceeds Runway Gen-3 Alpha at an estimated 83.5, while Runway charges subscription fees and per-credit generation costs. The open source model is free. The quality is equivalent.
A 4-6 point VBench gap sounds decisive until you understand what those points represent in practice. The dimensions where Sora excels are specific:
Multi-scene consistency across very long clips (beyond 30 seconds). Physics simulation accuracy for complex interactions where objects need to behave according to real-world mechanics. Fine-grained prompt adherence on abstract or metaphorical inputs where the model needs to interpret creative direction rather than execute explicit instructions.
These capabilities matter most for cinematic production work, demanding narrative sequences, and technically complex simulation content. They matter considerably less for the applications that drive commercial video AI volume: social media content, product demonstrations, marketing B-roll, and short promotional clips.
For a brand team producing 30-second product spots, the practical difference between VBench 83 and VBench 88 is often invisible to the end client. The Ars Technica video AI benchmark analysis notes that real-world viewer preference ratings diverge from VBench rankings for short-form content under 15 seconds, with the correlation weakening below 10 seconds. Most social media content is under 10 seconds.
For the 80% of commercial video AI use cases, the VBench gap between open source and Sora stopped mattering this week.
Quality comparison is one part of the story. Cost comparison is where the argument becomes unanswerable.
OpenAI's Sora API charges approximately $0.04 per second of generated video at standard quality. Ten hours of video represents 36,000 seconds. At $0.04 per second, that's $1,440 for 10 hours of AI video output.
For a mid-size content agency producing social media video at volume, 10 hours per week is not unusual. Annualized: over $74,000 per year in Sora API costs for a single high-volume team.
The $20/month ChatGPT Pro subscription includes Sora access but with generation limits that don't scale to professional production volumes. Once a team hits professional volume, the API rate becomes the relevant number.
An RTX 4090 GPU costs approximately $1,800 new (or $1,200-1,400 on the used market). At current US electricity rates (~$0.12/kWh) and with an RTX 4090 drawing approximately 450W under load:
The math: $1,440 for Sora versus $0.19 locally. A 7,500x cost differential for comparable quality on the use cases that drive most commercial production volume. The GPU pays for itself in API savings after fewer than 13 hours of operation.
Not every creator wants to manage local GPU infrastructure. For them, Together AI currently runs LTX-Video 2.3 inference at $0.002 per second, twenty times cheaper than Sora's $0.04. Generating 10 hours of video through Together AI costs approximately $72, versus $1,440 on Sora.
A 20x cost advantage with no hardware investment, no infrastructure management, and no local VRAM constraints. That is the comparison every enterprise procurement team will run when evaluating video AI spend in 2026.
| Generation method | Cost per 10 hrs video | Hardware requirement | API key needed |
|---|---|---|---|
| Sora API (OpenAI) | $1,440 | None (cloud) | ✓ |
| Runway Gen-3 credits | ~$800-1,200 | None (cloud) | ✓ |
| Together AI (LTX 2.3) | ~$72 | None (cloud) | ✓ |
| Local RTX 4090 (LTX 2.3) | ~$0.19 | RTX 4090 | ✗ |
| Local RTX 4090 (Helios) | ~$0.25 | RTX 4090 | ✗ |
| Local RTX 3090 (CogVideoX 5B) | ~$0.15 | RTX 3090 | ✗ |
The economics are no longer close. For anyone generating video at professional volume, the cost argument for paid closed-source platforms collapses unless those platforms offer capabilities open source cannot.
The competitive pressure from this wave lands differently on each paid video AI service.
Sora is the most insulated, for now. Its VBench lead is real. ChatGPT integration gives it distribution that no open source project can match through model quality alone. The safety and content moderation built into the platform matters for enterprise customers in regulated industries.
The risk for Sora is not that open source matches it tomorrow. The risk is trajectory. At the rate the open source field is moving, the VBench gap closes to 2-3 points within 12-18 months, at which point quality differentiation alone cannot justify a 20x price premium. Consumer subscription tiers aimed at casual creators face the most immediate pressure, because casual creators are exactly the audience LTX 2.3's ComfyUI workflows serve at zero marginal cost.
Read more about OpenAI's model quality position and where the frontier sits.
Runway is in a complicated position. Gen-3 Alpha's benchmark scores are now matched by the leading open source models. Yet Runway charges subscription fees and credit costs for that quality tier.
The value Runway offers has never been purely model quality. It's filmmaker-oriented workflow, canvas-based editing tools, and a production-friendly interface that non-technical creators can use without ComfyUI expertise. That value remains real. The question is what the convenience premium is worth when the underlying model quality gap has closed.
Runway's answer is to move up the value chain. The Runway $315 million bet on world models represents a bet that the future of competitive advantage in video AI is not generation quality but world simulation, interactive editing, and production pipeline integration. If that bet pays off, Runway survives. If the world model gap also closes to open source, Runway's moat disappears.
Pika's product targets ease of use: no ComfyUI, no local GPU, a simple interface for casual video creation. Consumer users making 10 clips a month aren't running cost-per-second calculations.
The long-term problem for Pika: its ceiling is constrained by the same economics that hurt Runway. As ComfyUI-based frontends become more polished and hosted LTX 2.3 services on Together AI and similar platforms become accessible without technical configuration, the UX moat narrows. Pika needs to offer something that hosted open source inference does not. Currently, that something is primarily the consumer brand and the polished onboarding experience.
Kling offers 4K output and up to 120-second clip generation. No current open source model matches either capability. For production use cases requiring extended high-resolution footage, Kling retains a real quality advantage.
The pressure from this wave on Kling is indirect: HunyuanVideo v2, a direct competitor from within China's AI market, is now sophisticated enough that Kling's domestic market position is under genuine challenge for anything below its maximum resolution and length capabilities.
Services that survive this wave will build moats around things open source cannot easily replicate: distribution scale, integrated production workflows, fine-tuning on proprietary or licensed datasets, enterprise compliance frameworks, content moderation at scale, and SLA-backed uptime guarantees. Raw generation quality is no longer a defensible moat.
The broader AI venture capital data from Q1 2026 shows that investors are already pricing this shift. Funding is moving toward AI infrastructure and workflow integration rather than foundational model development for video. That capital flow is a leading indicator of where durable value will accumulate.
The velocity of community adoption following these releases has been striking, even by open source AI standards.
Within 12 hours of LTX-Video 2.3's HuggingFace publication, r/StableDiffusion had multiple threads in the top 10, including a megathread with hundreds of comments sharing outputs, workflow configurations, and hardware recommendations. Tutorial videos on YouTube appeared within the first day from channels with six-figure subscriber counts historically covering Stable Diffusion image generation. YouTube views across LTX 2.3 tutorial videos crossed 1 million within four days of release.
The ComfyUI integration is the primary accelerant. Because LTX 2.3 ships with official node support, existing Stable Diffusion creators can add video generation to workflows they already understand, using tools they already own. The time from download to first useful output is measured in minutes. Within 24 hours, the community had derivative workflows covering every major use case.
Helios saw similar momentum despite its anonymous origin. The mystery made it feel like a discovery rather than a product launch. That distinction matters for viral adoption: people share discoveries differently than product announcements. The organic sharing from "look what I found" beats "look what launched" in almost every creative AI community context.
This is the same pattern Stable Diffusion created in 2022. The model was impressive. The community tooling became dominant. Automatic1111, ControlNet, IP-Adapter, and thousands of extensions built by unpaid contributors created collective capabilities that no single company's product roadmap could match on cost or iteration speed. LTX 2.3 and Helios are starting that same cycle for video, with a larger starting community and more accessible hardware than Stable Diffusion had at its equivalent moment.
The stability of this pattern for AI creative tools isn't surprising once you understand the incentive structure. Open source creative communities contribute because the tools directly improve their own work. The return on contribution is immediate and personal. Closed platforms have to pay for every workflow improvement; open source communities build them for free because building them is the point.
The open source video AI acceleration creates a clear strategic decision point for every paid video AI platform. The options are not symmetric.
Option 1: Compete on generation quality. This is the weakest position. VBench scores for open source models have closed from 75 to 83 in eighteen months. Sora's estimated lead is 4-6 points. At that trajectory, quality parity arrives in roughly 12 months. Platforms that build their entire moat on generation quality will lose that moat.
Option 2: Compete on workflow and distribution. This is what Runway is attempting with its $315 million world model investment. Build capabilities that require sustained infrastructure: interactive editing, physics simulation, scene persistence across clips, integration with professional production tools like Adobe Premiere and DaVinci Resolve. Open source models generate video well. They don't (yet) support the full production workflow that professional studios need.
Option 3: Compete on compliance and enterprise features. Content moderation at scale, SLA-backed uptime, enterprise security certifications, team collaboration features, and output rights guaranteed in writing are all things open source models cannot offer out of the box. For media companies, advertising agencies with brand safety requirements, and enterprises in regulated industries, these features justify premium pricing regardless of the VBench gap.
Option 4: Compete on fine-tuning and personalization. Training video models on proprietary or licensed content is something individuals and small teams cannot easily do. Platforms that offer fine-tuning services on licensed IP, consistent character identity across a production, or brand-specific aesthetic matching build a capability that scales down to open source only with significant technical effort.
The platforms most at risk are those pursuing option 1 only. The platforms with viable paths forward are those already building options 2, 3, or 4.
For the creative community, the correct read is straightforward: open source video AI tools are now good enough for most professional applications, the economics are orders of magnitude better than paid alternatives, and the community infrastructure for support, workflows, and fine-tuning is developing faster than any single company's product team can match.
Open source video AI crossed the professional usability threshold in March 2026. That's the headline, and the data supports it without qualification.
VBench scores: LTX 2.3 at 83.1, matching or exceeding Runway Gen-3 Alpha at an estimated 83.5, with Sora 4-6 points ahead. HunyuanVideo v2 at 82.5. Helios at 81.7 with native 1080p. CogVideoX 5B at 79.8 on accessible hardware.
Economics: $0.19 per 10 hours locally versus $1,440 on Sora API. $72 per 10 hours through Together AI versus $800-1,200 through Runway credits. The cost argument for paid closed-source platforms no longer holds for professional production volume.
Community infrastructure: ComfyUI workflows for every major use case within 24 hours. Tutorial videos crossing 1 million views within four days. Developer communities publishing derivative models, LoRA checkpoints, and fine-tuned variants within a week.
The 4-6 point VBench gap to Sora remains real. It matters for premium cinematic production: multi-scene coherence across long clips, physics simulation accuracy, abstract prompt interpretation at the highest level of creative complexity. For those use cases, Sora still earns its price. For the other 80%, the calculation has changed permanently.
What happens next: the VBench gap closes over the next 12 months. The community tooling around LTX 2.3, Helios, and HunyuanVideo grows. Hardware costs fall as next-generation consumer GPUs ship. More anonymous labs produce capable models without the overhead of a company press strategy. The rate of improvement is not slowing. It is accelerating.
For independent filmmakers, game studios, advertising agencies, and social content teams: the tools are here, the economics make sense, and the community infrastructure is in place. The open source video AI era started on March 9, 2026.
LTX-Video 2.3 is an open source video generation model released by Lightricks on March 9, 2026, with 13 billion parameters and a VBench score of 83.1. It generates five-second 720p video in approximately 8 seconds on an RTX 4090 and runs locally at near-zero marginal cost. It matters because it matches Runway Gen-3 Alpha quality at zero ongoing cost, with full ComfyUI integration that the existing Stable Diffusion community can adopt immediately.
Helios scores 81.7 on VBench versus LTX 2.3's 83.1, making it slightly lower on the overall benchmark. However, Helios offers native 1080p output as a base capability, which LTX 2.3 does not. For professional users who need broadcast-quality resolution without post-processing upscaling, Helios is the better choice despite the lower overall score.
VBench is the primary benchmark for video generation quality, developed by researchers at Nanyang Technological University. It measures AI-generated video across 16 dimensions including temporal consistency, motion smoothness, aesthetic quality, subject consistency, and background coherence. Scores range from 0-100, with higher scores indicating better overall quality. Independent analysis of VBench methodology appears in Ars Technica's video AI coverage.
Generating 10 hours of video locally on an RTX 4090 costs approximately $0.19 in electricity and GPU amortization, based on current US electricity rates of $0.12/kWh and a 3-year GPU lifecycle. The RTX 4090 itself costs approximately $1,800 new. The hardware investment pays for itself in API cost savings after fewer than 13 hours of operation compared to Sora's $0.04 per second rate.
Yes. Together AI offers hosted LTX-Video 2.3 inference at $0.002 per second, twenty times cheaper than Sora at $0.04 per second. Generating 10 hours of video through Together AI costs approximately $72 versus $1,440 through Sora's API. No local GPU is required.
Consumer and prosumer tiers from Sora ($20/month ChatGPT Pro), Runway Gen-3 ($12/month), and Pika 2.0 face the most direct pressure. Services that compete on enterprise compliance features, extended clip length (Kling at 120 seconds), 4K native resolution, or integrated production workflow tools maintain near-term differentiation. Coverage of competitive dynamics is at TechCrunch and The Verge.
For the majority of professional applications, yes. Open source video AI crossed the professional usability threshold with the March 2026 release wave. VBench scores for LTX 2.3 and HunyuanVideo v2 match or exceed commercial tools from Runway. The remaining gap to Sora matters for premium cinematic work requiring extended scene coherence and physics-accurate simulation, but not for social media, product demos, B-roll, and promotional content.
The pattern is structurally identical. Stable Diffusion arrived in August 2022 as a capable open source image model, and the community-built library of extensions and workflows became dominant within months. LTX 2.3 and Helios are triggering the same cycle for video, with a larger starting community and more accessible hardware than existed for image AI at its equivalent inflection point. The main difference is speed: the video AI version of this cycle is moving roughly 18 months faster.
LTX-Video 2.3 ships under a commercially permissive license. Check the model card on HuggingFace for the specific license terms and any output rights conditions before building commercial products. Helios's license terms are stated on its HuggingFace model card. Output rights for both differ from Sora, which retains the right to use generated content for model training under its current terms of service.
The full 13B LTX-Video 2.3 model requires approximately 24GB VRAM, making the RTX 4090 (24GB) the minimum consumer GPU for the full model. The 2B distilled variant runs on GPUs with 16GB VRAM, including the RTX 4080. CogVideoX 5B runs on 16GB configurations. Cloud inference through Together AI eliminates local GPU requirements entirely.
LTX-Video 2.3 is available at HuggingFace Lightricks. Helios, HunyuanVideo v2, and CogVideoX 5B are available through their respective HuggingFace organization pages. Official ComfyUI workflows for LTX 2.3 are maintained in the Lightricks GitHub repository.
The v2 update focuses specifically on temporal consistency, the dimension where AI video most visibly fails. Tencent's 3D attention architecture processes spatial and temporal relationships simultaneously, and v2 refines the motion planning component to improve long-range coherence across the full clip duration. Objects maintain consistent behavior not just frame-to-frame but across the entire generated video. VBench score sits at approximately 82.5.
The lab behind Helios has disclosed no company affiliation, team information, or institutional connection. This appears to be deliberate. Theories include a stealth lab preparing for a commercial reveal, an academic team avoiding IP complications, or a well-resourced individual project. The anonymity has not slowed adoption. If anything, the "discovery" framing drove more organic sharing than a typical product launch would have produced.
CogVideoX 5B at 5 billion parameters is the most accessible entry point: it runs on GPUs with 16GB VRAM, generates usable video for common use cases, and requires less compute than LTX 2.3 or HunyuanVideo v2. For beginners with RTX 4090 hardware who want maximum quality, LTX 2.3 with the official ComfyUI workflows provides the smoothest onboarding experience.
On an RTX 4090, LTX-Video 2.3 generates a 5-second 720p clip in approximately 8 seconds. The 2B distilled variant generates the same clip in roughly 3 seconds with a modest quality reduction. For comparison, Sora and Runway generation times vary by server load, subscription tier, and queue position, with no guaranteed turnaround time.
Sora's VBench score is estimated at 87-89. LTX-Video 2.3 scores 83.1. The gap is 4-6 points. This gap is meaningful for extended cinematic content requiring complex physics and multi-scene coherence, and less meaningful for short-form social media, product demos, and B-roll that represents most commercial video AI volume.
ComfyUI is the dominant node-based workflow tool for the Stable Diffusion image generation community, with hundreds of thousands of active users who have existing workflow libraries, plugin collections, and technical familiarity. LTX 2.3's official ComfyUI integration means the entire existing image AI community can add video generation to their workflows without learning new tools. This is why LTX 2.3 adoption spread faster than any previous open source video release.
Yes. LoRA fine-tuning for LTX 2.3 and HunyuanVideo v2 is already available through community tools, with training scripts published on HuggingFace and GitHub. Fine-tuning on specific datasets allows consistent character identity, brand-specific aesthetics, or domain-specific motion styles. This capability is technically demanding but increasingly accessible through community tooling.
Use cases that still favor Sora or paid alternatives: extended clips above 60 seconds at high quality, content requiring physics-accurate simulation for complex interactions, very long clip multi-scene coherence for narrative production, and enterprise contexts requiring guaranteed content moderation, SLA-backed uptime, and formal output rights agreements. For everything else, open source models are now production-ready.
The open source video AI field went from VBench scores around 75 in mid-2025 to 83+ in March 2026, closing roughly two-thirds of the gap to Sora in 8 months. If that rate continues, parity with Sora's current quality level arrives within 12-18 months. The rate is unlikely to slow: more labs, more compute, more community data, and hardware improvements all push the same direction.
For creators with access to an RTX 4090 or willing to use Together AI's hosted inference, the answer is yes for most professional applications. The quality is sufficient, the cost differential is dramatic, and the ComfyUI workflows for LTX 2.3 are mature enough for production use. The remaining reason to stay with paid platforms is workflow integration, content moderation guarantees, or use cases requiring capabilities open source doesn't yet cover (extended clips, 4K native).
The open source video AI era began on March 9, 2026. Explore the LTX-Video 2.3 model page on HuggingFace and the official ComfyUI workflows to start generating.
Alibaba Qwen 3.5 small models beat GPT-4o-mini on benchmarks at $0.01/M tokens. Six MIT-licensed sizes, 119 languages, runs on iPhone.
DeepSeek V4 trillion parameter open source multimodal model beats GPT-5.3 on MMLU. MIT license. $0.14/M tokens. Full benchmark breakdown.
Europe's largest-ever seed round backs LeCun's world models startup AMI Labs at a $3.5B valuation, a $1B bet that LLMs are the wrong path to intelligence.