How Cerebras + DataRobot Accelerates AI App Growth

December 17, 2024

42

Sooner, smarter, extra responsive AI functions – that’s what your customers count on. However when giant language fashions (LLMs) are sluggish to reply, person expertise suffers. Each millisecond counts.

With Cerebras’ high-speed inference endpoints, you possibly can scale back latency, velocity up mannequin responses, and keep high quality at scale with fashions like Llama 3.1-70B. By following a number of easy steps, you’ll have the ability to customise and deploy your individual LLMs, providing you with the management to optimize for each velocity and high quality.

On this weblog, we’ll stroll you thru you how you can:

Arrange Llama 3.1-70B within the DataRobot LLM Playground.
Generate and apply an API key to leverage Cerebras for inference.
Customise and deploy smarter, quicker functions.

By the tip, you’ll be able to deploy LLMs that ship velocity, precision, and real-time responsiveness.

Prototype, customise, and check LLMs in a single place

Prototyping and testing generative AI fashions usually require a patchwork of disconnected instruments. However with a unified, built-in setting for LLMs, retrieval strategies, and analysis metrics, you possibly can transfer from thought to working prototype quicker and with fewer roadblocks.

This streamlined course of means you possibly can give attention to constructing efficient, high-impact AI functions with out the trouble of piecing collectively instruments from completely different platforms.

Let’s stroll by means of a use case to see how one can leverage these capabilities to develop smarter, quicker AI functions.

Use case: Rushing up LLM interference with out sacrificing high quality

Low latency is crucial for constructing quick, responsive AI functions. However accelerated responses don’t have to come back at the price of high quality.

The velocity of Cerebras Inference outperforms different platforms, enabling builders to construct functions that really feel easy, responsive, and clever.

When mixed with an intuitive growth expertise, you possibly can:

Cut back LLM latency for quicker person interactions.
Experiment extra effectively with new fashions and workflows.
Deploy functions that reply immediately to person actions.

The diagrams under present Cerebras’ efficiency on Llama 3.1-70B, illustrating quicker response occasions and decrease latency than different platforms. This permits speedy iteration throughout growth and real-time efficiency in manufacturing.

Image showing output speed of llama 3.1 70B with Cerebras

Image showing response time of llama 3.1 70B with Cerebras

How mannequin dimension impacts LLM velocity and efficiency

As LLMs develop bigger and extra complicated, their outputs turn out to be extra related and complete — however this comes at a value: elevated latency. Cerebras tackles this problem with optimized computations, streamlined information switch, and clever decoding designed for velocity.

These velocity enhancements are already reworking AI functions in industries like prescription drugs and voice AI. For instance:

GlaxoSmithKline (GSK) makes use of Cerebras Inference to speed up drug discovery, driving greater productiveness.
LiveKit has boosted the efficiency of ChatGPT’s voice mode pipeline, attaining quicker response occasions than conventional inference options.

The outcomes are measurable. On Llama 3.1-70B, Cerebras delivers 70x quicker inference than vanilla GPUs, enabling smoother, real-time interactions and quicker experimentation cycles.

This efficiency is powered by Cerebras’ third-generation Wafer-Scale Engine (WSE-3), a customized processor designed to optimize the tensor-based, sparse linear algebra operations that drive LLM inference.

By prioritizing efficiency, effectivity, and adaptability, the WSE-3 ensures quicker, extra constant outcomes throughout mannequin efficiency.

Cerebras Inference’s velocity reduces the latency of AI functions powered by their fashions, enabling deeper reasoning and extra responsive person experiences. Accessing these optimized fashions is straightforward — they’re hosted on Cerebras and accessible through a single endpoint, so you can begin leveraging them with minimal setup.

Image showing tokens per second on Cerebras Inference

Step-by-step: The way to customise and deploy Llama 3.1-70B for low-latency AI

Integrating LLMs like Llama 3.1-70B from Cerebras into DataRobot lets you customise, check, and deploy AI fashions in just some steps. This course of helps quicker growth, interactive testing, and better management over LLM customization.

1. Generate an API key for Llama 3.1-70B within the Cerebras platform.

Image showing generating and API key on Cerebras

2. In DataRobot, create a customized mannequin within the Mannequin Workshop that calls out to the Cerebras endpoint the place Llama 3.1 70B is hosted.

Image of the model workshop on DataRobot (1)

3. Inside the customized mannequin, place the Cerebras API key throughout the customized.py file.

Image of putting Cerebras API key into custom py file in DataRobot (1)

4. Deploy the customized mannequin to an endpoint within the DataRobot Console, enabling LLM blueprints to leverage it for inference.

Image of deploying llama 3.1 70B on Cerebras in DataRobot

5. Add your deployed Cerebras LLM to the LLM blueprint within the DataRobot LLM Playground to begin chatting with Llama 3.1 -70B.

Image of adding an LLM to the playground in DataRobot

6. As soon as the LLM is added to the blueprint, check responses by adjusting prompting and retrieval parameters, and examine outputs with different LLMs straight within the DataRobot GUI.

Broaden the boundaries of LLM inference to your AI functions

Deploying LLMs like Llama 3.1-70B with low latency and real-time responsiveness is not any small job. However with the correct instruments and workflows, you possibly can obtain each.

By integrating LLMs into DataRobot’s LLM Playground and leveraging Cerebras’ optimized inference, you possibly can simplify customization, velocity up testing, and scale back complexity – all whereas sustaining the efficiency your customers count on.

As LLMs develop bigger and extra highly effective, having a streamlined course of for testing, customization, and integration, can be important for groups trying to keep forward.

Discover it your self. Entry Cerebras Inference, generate your API key, and begin constructing AI functions in DataRobot.

In regards to the creator

Kumar Venkateswar

VP of Product, Platform and Ecosystem

Kumar Venkateswar is VP of Product, Platform and Ecosystem at DataRobot. He leads product administration for DataRobot’s foundational providers and ecosystem partnerships, bridging the gaps between environment friendly infrastructure and integrations that maximize AI outcomes. Previous to DataRobot, Kumar labored at Amazon and Microsoft, together with main product administration groups for Amazon SageMaker and Amazon Q Enterprise.

Meet Kumar Venkateswar

Nathaniel Daly

Principal Product Supervisor

Nathaniel Daly is a Senior Product Supervisor at DataRobot specializing in AutoML and time collection merchandise. He’s centered on bringing advances in information science to customers such that they’ll leverage this worth to unravel actual world enterprise issues. He holds a level in Arithmetic from College of California, Berkeley.

Meet Nathaniel Daly

How Cerebras + DataRobot Accelerates AI App Growth

Prototype, customise, and check LLMs in a single place

Use case: Rushing up LLM interference with out sacrificing high quality

How mannequin dimension impacts LLM velocity and efficiency

Step-by-step: The way to customise and deploy Llama 3.1-70B for low-latency AI

Broaden the boundaries of LLM inference to your AI functions

Related Articles

Farming on the edge with autonomous robots

This intelligent rip-off practically hijacked a tech CEO’s Apple ID • Graham Cluley

Skydio X10D Military order marks its biggest-ever single drone order

LEAVE A REPLY Cancel reply

Latest Articles

Farming on the edge with autonomous robots

This intelligent rip-off practically hijacked a tech CEO’s Apple ID • Graham Cluley

Skydio X10D Military order marks its biggest-ever single drone order

Physicists uncover a heavy cousin of the proton at CERN’s Massive Hadron Collider

Reviving Mind Exercise After ‘Cryosleep’ Inches Nearer in Pioneering Examine

ABOUT US