Cisco IT deploys AI-ready information heart in weeks, whereas scaling for the longer term

March 17, 2025

42

Cisco IT designed AI-ready infrastructure with Cisco compute, best-in-class NVIDIA GPUs, and Cisco networking that helps AI mannequin coaching and inferencing throughout dozens of use instances for Cisco product and engineering groups.

It’s no secret that the stress to implement AI throughout the enterprise presents challenges for IT groups. It challenges us to deploy new know-how quicker than ever earlier than and rethink how information facilities are constructed to fulfill rising calls for throughout compute, networking, and storage. Whereas the tempo of innovation and enterprise development is exhilarating, it will possibly additionally really feel daunting.

How do you rapidly construct the info heart infrastructure wanted to energy AI workloads and sustain with essential enterprise wants? That is precisely what our group, Cisco IT, was dealing with.

The ask from the enterprise

We have been approached by a product group that wanted a option to run AI workloads which can be used to develop and take a look at new AI capabilities for Cisco merchandise. It would ultimately assist mannequin coaching and inferencing for a number of groups and dozens of use instances throughout the enterprise. And they wanted it completed rapidly. want for the product groups to get improvements to our prospects as rapidly as attainable, we needed to ship the new surroundings in simply three months.

The know-how necessities

We started by mapping out the necessities for the brand new AI infrastructure. A non-blocking, lossless community was important with the AI compute material to make sure dependable, predictable, and high-performance information transmission throughout the AI cluster. Ethernet was the first-class alternative. Different necessities included:

Clever buffering, low latency: Like every good information heart, these are important for sustaining clean information move and minimizing delays, in addition to enhancing the responsiveness of the AI material.
Dynamic congestion avoidance for varied workloads: AI workloads can fluctuate considerably of their calls for on community and compute sources. Dynamic congestion avoidance would be sure that sources have been allotted effectively, stop efficiency degradation throughout peak utilization, keep constant service ranges, and stop bottlenecks that would disrupt operations.
Devoted front-end and back-end networks, non-blocking material: With a objective to construct scalable infrastructure, a non-blocking material would guarantee ample bandwidth for information to move freely, in addition to allow a high-speed information switch — which is essential for dealing with giant information volumes typical with AI purposes. By segregating our front-end and back-end networks, we may improve safety, efficiency, and reliability.
Automation for Day 0 to Day 2 operations: From the day we deployed, configured, and tackled ongoing administration, we needed to scale back any guide intervention to maintain processes fast and reduce human error.
Telemetry and visibility: Collectively, these capabilities would supply insights into system efficiency and well being, which might permit for proactive administration and troubleshooting.

The plan – with a number of challenges to beat

With the necessities in place, we started determining the place the cluster could possibly be constructed. The prevailing information heart amenities weren’t designed to assist AI workloads. We knew that constructing from scratch with a full information heart refresh would take 18-24 months – which was not an choice. We would have liked to ship an operational AI infrastructure in a matter of weeks, so we leveraged an current facility with minor adjustments to cabling and system distribution to accommodate.

Our subsequent issues have been across the information getting used to coach fashions. Since a few of that information wouldn’t be saved domestically in the identical facility as our AI infrastructure, we determined to duplicate information from different information facilities into our AI infrastructure storage programs to keep away from efficiency points associated to community latency. Our community group had to make sure ample community capability to deal with this information replication into the AI infrastructure.

Now, attending to the precise infrastructure. We designed the guts of the AI infrastructure with Cisco compute, best-in-class GPUs from NVIDIA, and Cisco networking. On the networking facet, we constructed a front-end ethernet community and back-end lossless ethernet community. With this mannequin, we have been assured that we may rapidly deploy superior AI capabilities in any surroundings and proceed so as to add them as we introduced extra amenities on-line.

Merchandise:

Supporting a rising surroundings

After making the preliminary infrastructure obtainable, the enterprise added extra use instances every week and we added further AI clusters to assist them. We would have liked a option to make all of it simpler to handle, together with managing the swap configurations and monitoring for packet loss. We used Cisco Nexus Dashboard, which dramatically streamlined operations and ensured we may develop and scale for the longer term. We have been already utilizing it in different components of our information heart operations, so it was simple to increase it to our AI infrastructure and didn’t require the group to study a further instrument.

The outcomes

Our group was capable of transfer quick and overcome a number of hurdles in designing the answer. We have been capable of design and deploy the backend of the AI material in beneath three hours and deploy your complete AI cluster and materials in 3 months, which was 80% quicker than the choice rebuild.

At the moment, the surroundings helps greater than 25 use instances throughout the enterprise, with extra added every week. This consists of:

Webex Audio: Enhancing codec improvement for noise cancellation and decrease bandwidth information prediction
Webex Video: Mannequin coaching for background alternative, gesture recognition, and face landmarks
Customized LLM coaching for cybersecurity merchandise and capabilities

Not solely have been we capable of assist the wants of the enterprise right this moment, however we’re designing how our information facilities have to evolve for the longer term. We’re actively constructing out extra clusters and can share further particulars on our journey in future blogs. The modularity and adaptability of Cisco’s networking, compute, and safety offers us confidence that we will preserve scaling with the enterprise.

Further sources:

Share:

Cisco IT deploys AI-ready information heart in weeks, whereas scaling for the longer term

The ask from the enterprise

The know-how necessities

The plan – with a number of challenges to beat

Merchandise:

Supporting a rising surroundings

The outcomes

Related Articles

30 Shark Vacuums Are on Sale Proper Now. These Are the two Greatest Offers

Industrial additive manufacturing sector data renewed development of 5.6%

For KPMG Canada’s Christine Andrew, Copilot isn’t only a time saver—it unlocks high-value impression

LEAVE A REPLY Cancel reply

Latest Articles

30 Shark Vacuums Are on Sale Proper Now. These Are the two Greatest Offers

Industrial additive manufacturing sector data renewed development of 5.6%

For KPMG Canada’s Christine Andrew, Copilot isn’t only a time saver—it unlocks high-value impression

Automating knowledge classification in Amazon SageMaker Catalog utilizing an AI agent

How PostgreSQL on Azure helps modernize legacy databases

ABOUT US