13 C
Canberra
Wednesday, October 22, 2025

The way forward for cloud AI infrastructure: Inside Huawei’s UnifiedBus structure


The problem of constructing environment friendly cloud AI infrastructure has all the time been about scale – not simply including extra servers, however making these servers work collectively seamlessly. At Huawei Join 2025, the Chinese language know-how large unveiled an method that modifications how cloud suppliers and enterprises can pool computing sources.

As a substitute of managing 1000’s of unbiased servers that talk by means of conventional networking, Huawei’s SuperPod know-how creates what executives describe as unified programs the place bodily infrastructure behaves as single logical machines. For cloud suppliers constructing AI providers and enterprises deploying non-public AI clouds, this represents a big shift in how infrastructure might be architected, managed, and scaled.

The cloud infrastructure downside SuperPod solves

Conventional cloud AI infrastructure faces a persistent problem: as clusters develop bigger, computing effectivity truly decreases. This occurs as a result of particular person servers in a cluster stay considerably unbiased, speaking by means of community protocols that introduce latency and complexity. The result’s what {industry} professionals name “scaling penalties” – the place including extra {hardware} doesn’t proportionally enhance usable computing energy.

Yang Chaobin, Huawei’s Director of the Board and CEO of the ICT Enterprise Group, defined that the corporate developed “the groundbreaking SuperPod structure primarily based on our UnifiedBus interconnect protocol. The structure deeply interconnects bodily servers in order that they’ll be taught, suppose, and motive like a single logical server.”

This isn’t simply sooner networking; it’s a re-architecting of how cloud AI infrastructure might be constructed.

The technical basis: UnifiedBus protocol

On the core of Huawei’s cloud AI infrastructure method is UnifiedBus, an interconnect protocol designed particularly for massive-scale useful resource pooling. The protocol addresses two necessary infrastructure challenges which have restricted cloud AI deployments: sustaining reliability over lengthy distances in information centres, and optimising the bandwidth-latency trade-off that impacts efficiency.

Conventional information centre connectivity depends on both copper cables (excessive bandwidth, quick vary, usually connecting simply two racks) or optical cables (longer vary however with reliability considerations at scale). For cloud suppliers constructing infrastructure to assist 1000’s of AI processors, neither possibility is proving splendid.

Eric Xu, Huawei’s Deputy Chairman and Rotating Chairman, stated fixing these basic connectivity challenges was important to the corporate’s cloud AI infrastructure technique. Drawing on what he described as Huawei’s three many years of connectivity experience, Xu detailed the breakthrough options: “We’ve got constructed reliability into each layer of our interconnect protocol, from the bodily layer and information hyperlink layer, all the way in which as much as the community and transmission layers. There’s 100-ns-level fault detection and safety switching on optical paths, making any intermittent disconnections or faults of optical modules imperceptible on the software layer.”

The result’s what Huawei describes as an optical interconnect that’s 100 occasions extra dependable than typical approaches, supporting connections over 200 metres in information centres whereas sustaining the reliability traits usually related to copper connections.

SuperPod configurations: From enterprise to hyperscale

Huawei’s cloud AI infrastructure product line spans a number of scales, every designed for various deployment eventualities. The Atlas 950 SuperPod represents the flagship implementation, that includes as much as 8,192 Ascend 950DT AI processors configured in 160 cupboards occupying 1,000 sq. metres of information centre area.

The system delivers 8 EFLOPS in FP8 precision and 16 EFLOPS in FP4 precision, with 1,152 TB of complete reminiscence capability. The interconnect specs reveal the structure’s ambitions: 16 PB/s bandwidth throughout the whole system.

As Xu famous, “This implies a single Atlas 950 SuperPod can have an interconnect bandwidth over 10 occasions larger than the whole globe’s complete peak Web bandwidth.” The extent of inner connectivity allows the system to take care of linear efficiency scaling – including extra processors genuinely will increase usable computing energy proportionally.

For bigger cloud deployments, the Atlas 960 SuperPod incorporates 15,488 Ascend 960 processors in 220 cupboards in 2,200 sq. metres, delivering 30 EFLOPS in FP8 and 60 EFLOPS in FP4, with 4,460 TB of reminiscence and 34 PB/s interconnect bandwidth. The Atlas 960 will likely be obtainable within the fourth quarter of 2027.

Implications for cloud service supply

Past the flagship SuperPod merchandise, Huawei launched cloud AI infrastructure configurations designed particularly for enterprise information centres. The Atlas 850 SuperPod, positioned as “the {industry}’s first air-cooled SuperPoD server designed for enterprises,” options eight Ascend NPUs and helps versatile multi-cabinet deployment as much as 128 items with 1,024 NPUs.

Considerably, this configuration might be deployed in customary air-cooled tools rooms, avoiding the infrastructure modifications required for liquid cooling programs. For cloud suppliers and enterprises, this presents sensible deployment flexibility. Organisations can implement SuperPod structure with out essentially requiring full information centre redesigns, doubtlessly accelerating adoption timelines.

SuperCluster structure: Hyperscale cloud deployment

Huawei’s imaginative and prescient extends past particular person SuperPods to what the corporate calls SuperClusters – large cloud AI infrastructure deployments comprising a number of interconnected SuperPods. The Atlas 950 SuperCluster will incorporate 64 Atlas 950 SuperPods, making a system with over 520,000 AI processors in additional than 10,000 cupboards, delivering 524 EFLOPS in FP8 precision.

A necessary technical resolution impacts how cloud suppliers may deploy these programs. The Atlas 950 SuperCluster helps each UBoE (UnifiedBus over Ethernet) and RoCE (RDMA over Converged Ethernet) protocols. UBoE allows UnifiedBus to run over customary Ethernet infrastructure, permitting cloud suppliers to doubtlessly combine SuperPod know-how with present information centre networks.

In line with Huawei’s specs, UBoE clusters display decrease static latency and better reliability in comparison with RoCE clusters, whereas requiring fewer switches and optical modules. For cloud suppliers planning large-scale deployments, this might translate to each efficiency and financial benefits.

The Atlas 960 SuperCluster, scheduled for fourth quarter 2027 availability, will combine a couple of million NPUs to ship 2 ZFLOPS (zettaFLOPS) in FP8 and 4 ZFLOPS in FP4. The specs place the system for what Xu described as future AI fashions “with over 1 trillion or 10 trillion parameters.”

Past AI: Basic-purpose cloud infrastructure

The SuperPod structure’s implications lengthen past AI workloads into general-purpose cloud computing by means of the TaiShan 950 SuperPod. Constructed on Kunpeng 950 processors that includes as much as 192 cores and 384 threads, this method addresses enterprise necessities for mission-important functions historically run on mainframes, Oracle’s Exadata database servers, and mid-range computer systems.

The TaiShan 950 SuperPod helps as much as 16 nodes with 32 processors and 48 TB of reminiscence, incorporating reminiscence pooling, SSD pooling, and DPU (Knowledge Processing Unit) pooling. When built-in with Huawei’s distributed GaussDB database, the system delivers what the corporate claims is a 2.9x efficiency enchancment over conventional architectures with out requiring software modifications.

For cloud suppliers serving enterprise clients, this presents vital alternatives for cloud-native infrastructure. Past databases, Huawei claims the TaiShan 950 SuperPod will increase reminiscence use by 20% in virtualised environments and accelerates Spark workloads by 30%.

The open structure technique

Maybe most important for the broader cloud AI infrastructure market, Huawei introduced that UnifiedBus 2.0 technical specs could be launched as open requirements. The corporate is offering open entry to each {hardware} and software program elements: NPU modules, air-cooled and liquid-cooled blade servers, AI playing cards, CPU boards, cascade playing cards, CANN compiler instruments, Thoughts collection software kits, and openPangu basis fashions – all by December 31, 2025.

Yang framed this as ecosystem improvement: “We’re dedicated to our open-hardware and open-source-software method that can assist extra companions develop their very own industry-scenario-based SuperPod options. The desire speed up developer innovation and foster a thriving ecosystem.”

For cloud suppliers and system integrators, this open method doubtlessly lowers limitations to deploying SuperPod-based infrastructure. Quite than being locked into single-vendor options, companions can develop customised implementations utilizing UnifiedBus specs.

Market validation and deployment actuality

The cloud AI infrastructure structure has already seen real-world deployment. Over 300 Atlas 900 A3 SuperPod items have been shipped in 2025, deployed for greater than 20 clients in web, finance, provider, electrical energy, and manufacturing sectors. The deployment scale supplies some validation that the structure features past laboratory demonstrations.

Xu acknowledged the context shaping Huawei’s infrastructure technique: “The Chinese language mainland will lag behind in semiconductor manufacturing course of nodes for a comparatively very long time,” including that “Sustainable computing energy can solely be achieved with course of nodes which might be virtually obtainable.”

The assertion frames the SuperPod structure as a strategic response to constraints – attaining aggressive efficiency by means of architectural innovation reasonably than solely by means of superior semiconductor manufacturing.

What this implies for cloud infrastructure evolution

Huawei’s SuperPod structure represents a selected guess on how cloud AI infrastructure ought to evolve: towards tighter integration and useful resource pooling at large scale, enabled by purpose-built interconnect know-how. Whether or not this method proves more practical than options – like loosely coupled clusters with refined software program orchestration – stays to be demonstrated at hyperscale manufacturing deployments.

For cloud suppliers, the open structure technique introduces choices for constructing AI infrastructure with out essentially adopting the tightly built-in hardware-software approaches dominant amongst Western rivals. For enterprises evaluating non-public cloud AI infrastructure, SuperPod configurations just like the air-cooled Atlas 850 current deployment paths that don’t require full information centre redesigns.

The broader implication considerations how cloud AI infrastructure could be architected in markets the place entry to essentially the most superior semiconductor manufacturing stays constrained. Huawei’s method means that architectural innovation in interconnect, useful resource pooling, and system design can doubtlessly compensate for limitations in particular person processor capabilities – a proposition that will likely be examined as these programs scale to manufacturing workloads in numerous cloud deployment eventualities.

(Photograph taken from the video of Xu’s keynote speech on the opening of Huawei Join 2025)

Need to be taught extra about Cloud Computing from {industry} leaders? Take a look at Cyber Safety & Cloud Expo happening in Amsterdam, California, and London. The excellent occasion is a part of TechEx and co-located with different main know-how occasions. Click on right here for extra info.

CloudTech Information is powered by TechForge Media. Discover different upcoming enterprise know-how occasions and webinars right here.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles