By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
10alert.com10alert.com10alert.com
  • Threats
    • WordPress ThreatsDanger
    Threats
    A cyber or cybersecurity threat is a malicious act that seeks to damage data, steal data, or disrupt digital life in general. Cyber threats include…
    Show More
    Top News
    Not-a-Virus: What is it?
    1 year ago
    Phishing for cryptocurrencies: How bitcoins are stolen
    1 year ago
    How to avoid turning your smartphone into a spyware zoo
    1 year ago
    Latest News
    Patchstack Becomes Member Of Open Source Security Foundation
    13 hours ago
    PDF Phishing: Beyond the Bait
    16 hours ago
    Update ASAP! Critical Unauthenticated Arbitrary File Upload in MW WP Form Allows Malicious Code Execution
    19 hours ago
    Fake CVE Phishing Campaign Tricks WordPress Users Into Installing Malware
    2 days ago
  • Fix
    Fix
    Troubleshooting guide you need when errors, bugs or technical glitches might ruin your digital experience.
    Show More
    Top News
    For 0-day vulnerabilities in Windows, temporary patches
    1 year ago
    Windows 11 22H2 (build 22621.317) outs in the Release Preview Channel
    1 year ago
    How to avoid problems installing Windows 11 22H2
    1 year ago
    Latest News
    How automatically delete unused files from my Downloads folder?
    10 months ago
    Now you can speed up any video in your browser
    10 months ago
    How to restore access to a file after EFS or view it on another computer?
    10 months ago
    18 Proven Tips to Speed Up Your WordPress Site and Improve SEO | 2023 Guide
    11 months ago
  • How To
    How ToShow More
    A year in recap: Windows accessibility
    19 hours ago
    How to stop, disable, and remove any Android apps — even system ones
    3 days ago
    Bigger, Better, Cooler in a 2U1N form factor
    Bigger, Better, Cooler in a 2U1N form factor
    4 days ago
    Vulnerability in crypto wallets created online in the early 2010s
    5 days ago
    Use Windows 11 features to inspire creativity, speed up everyday tasks
    6 days ago
  • News
    News
    This category of resources includes the latest technology news and updates, covering a wide range of topics and innovations in the tech industry. From new…
    Show More
    Top News
    The most convenient way to find music from TV shows, movies and games
    1 year ago
    Easter egg “Unicorn” in Firefox
    1 year ago
    Slow internet simulation
    1 year ago
    Latest News
    How to disable news feed from Widgets on Windows 11
    17 hours ago
    How to fix performance issues after upgrading to Windows 11 23H2
    17 hours ago
    How to disable updates on Windows 10 Pro and Home
    2 days ago
    Change screen brightness on Windows 11
    4 days ago
  • Glossary
  • My Bookmarks
Reading: The best place on Region: Earth for inference
Share
Notification Show More
Aa
Aa
10alert.com10alert.com
  • Threats
  • Fix
  • How To
  • News
  • Glossary
  • My Bookmarks
  • Threats
    • WordPress ThreatsDanger
  • Fix
  • How To
  • News
  • Glossary
  • My Bookmarks
Follow US
Apps

The best place on Region: Earth for inference

Andra Smith
Last updated: 22 October
Andra Smith 1 month ago
Share
16 Min Read

The best place on Region: Earth for inference

Contents
Inference: The future of AI workloadsThe network — “just right” for inferenceThe first inference cloud, running on Region EarthWorkers AI for serverless inferenceVectorize for… storing vectors!AI Gateway for caching, rate limiting and visibility into your AI deploymentsCollaborating with Meta to bring Llama 2 to our global networkLeveraging the ONNX runtime to make moving between cloud to edge to device seamless for developersPartnering with Hugging Face to provide optimized models at your fingertipsPartnering with Databricks to make AI modelsCompliance-friendly AIBudget-friendly AIPrivacy-friendly AINo, but really — we’re just getting started

Today, Cloudflare’s Workers platform is the place over a million developers come to build sophisticated full-stack applications that previously wouldn’t have been possible.

Of course, Workers didn’t start out that way. It started, on a day like today, as a Birthday Week announcement. It may not have had all the bells and whistles that exist today, but if you got to try Workers when it launched, it conjured this feeling: “this is different, and it’s going to change things”. All of a sudden, going from nothing to a fully scalable, global application took seconds, not hours, days, weeks or even months. It was the beginning of a different way to build applications.

If you’ve played with generative AI over the past few months, you may have had a similar feeling. Surveying a few friends and colleagues, our “aha” moments were all a bit different, but the overarching sentiment across the industry at this moment is unanimous — this is different, and it’s going to change things.

Today, we’re excited to make a series of announcements that we believe will make a similar impact as Workers did in the future of computing. Without burying the lede any further, here they are:

  • Workers AI (formerly known as Constellation), running on NVIDIA GPUs on Cloudflare’s global network, bringing the serverless model to AI — pay only for what you use, spend less time on infrastructure, and more on your application.
  • Vectorize, our vector Database, making it easy, fast and affordable to index and store vectors to support use cases that require access not just to running models, but customized data too.
  • AI Gateway, giving organizations the tools to cache, rate limit and observe their AI deployments regardless of where they’re running.

But that’s not all.

Doing big things is a team sport, and we don’t want to do it alone. Like in so much of what we do, we stand on the shoulders of giants. We’re thrilled to partner with some of the biggest players in the space: NVIDIA, Microsoft, Hugging Face, Databricks, and Meta.

Our announcements today mark just the beginning of Cloudflare’s journey into the AI space, like Workers did six years ago. While we encourage you to dive into each of our announcements (you won’t be disappointed!), we also wanted to take the chance to step back and provide you with a bit of our broader vision for AI, and how these announcements fit into it.

Inference: The future of AI workloads

There are two main processes involved in AI: training and inference.

Training a generative AI model is a long-running (sometimes months-long) compute intensive process, which results in a model. Training workloads are therefore best suited for running in traditional centralized cloud locations. Given the recent challenges in being able to obtain long-running access to GPUs, resulting in companies going multi-cloud, we’ve talked about the ways in which R2 can provide an essential service that eliminates egress fees for the training data to be accessed from any compute cloud. But that’s not what we’re here to talk about today.

While training requires many resources upfront, the much more ubiquitous AI-related compute task is inference. If you’ve recently asked ChatGPT a question, generated an image, translated some text, then you’ve performed an inference task. Since inference is required upon every single invocation (rather than just once), we expect that inference will become the dominant AI-related workload.

If training is best suited for a centralized cloud, then what is the best place for inference?

The network — “just right” for inference

The defining characteristic of inference is that there’s usually a user waiting on the other end of it. That is, it’s a latency sensitive task.

The best place, you might think, for a latency sensitive task is on the device. And it might be in some cases, but there are a few problems. First, hardware on devices is not nearly as powerful. Battery life.

On the other hand, you have centralized cloud compute. Unlike devices, the hardware running in centralized cloud locations has nothing if not horsepower. The problem, of course, is that it’s hundreds of milliseconds away from the user. And sometimes, they’re even across borders, which presents its own set of challenges.

So devices are not yet powerful enough, and centralized cloud is too far away. This makes the network the goldilocks of inference. Not too far, with sufficient compute power — just right.

The first inference cloud, running on Region Earth

One lesson we learned building our developer platform is that running applications at network scale not only helps optimize performance and scale (though obviously that’s a nice benefit!), but even more importantly, creates the right level of abstraction for developers to move fast.

Workers AI for serverless inference

Kicking things off with our announcement of Workers AI, we’re bringing the first truly serverless GPU cloud, to its perfect match — Region Earth. No machine learning expertise, no rummaging for GPUs. Just pick one of our provided models, and go.

We’ve put a lot of thought into designing Workers AI to make the experience of deploying a model as smooth as possible.

And if you’re deploying any models in the year 2023, chances are, one of them is an LLM.

Vectorize for… storing vectors!

To build an end-to-end AI-operated chat bot, you also need a way to present the user with a UI, parse the corpus of information you want to pass it (for example your product catalog), use the model to convert it into embeddings — and store them somewhere. Up until today, we offered the products you needed for the first two, but the latter — storing embeddings — requires a unique solution: a vector database.

Just as when we announced Workers, we soon after announced Workers KV — there’s little you can do with compute, without access to state. The same is true of AI — to build meaningful AI use cases, you need to give AI access to state. This is where a vector database comes into play, and why today we’re also excited to announce Vectorize, our own vector database.

AI Gateway for caching, rate limiting and visibility into your AI deployments

At Cloudflare, when we set out to improve something, the first step is always to measure it — if you can’t measure it, how can you improve it? When we heard about customers struggling to reign in AI deployment costs, we thought about how we would approach it — measure it, then improve it.

Our AI Gateway helps you do both!

Real-time observation capabilities empower proactive management, making it easier to monitor, debug, and fine-tune AI deployments. Leveraging it to cache, rate limit, and monitor AI deployments is essential for optimizing performance and managing costs effectively. By caching frequently used AI responses, it reduces latency and bolsters system reliability, while rate limiting ensures efficient resource allocation, mitigating the challenges of spiraling AI costs.

Collaborating with Meta to bring Llama 2 to our global network

Until recently, the only way to have access to an LLM was through calls to proprietary models. Training LLMs is a serious investment — in time, computing, and financial resources, and thus not something that’s accessible to most developers. Meta’s release of Llama 2, an open-source LLM, has presented an exciting shift, allowing developers to run and deploy their own LLMs. Except of course, one small detail — you still have to have access to a GPU to do so.

By making Llama 2 available as a part of the Workers AI catalog, we look forward to giving every developer access to an LLM — no configuration required.

Having a running model is, of course, just one component of an AI application.

Leveraging the ONNX runtime to make moving between cloud to edge to device seamless for developers

While the edge may be the optimal location for solving many of these problems, we do expect that applications will continue to be deployed at other locations along the spectrum of device, edge and centralized cloud.

Take for example, self-driving cars — when you’re making decisions where every millisecond matters, you need to make these decisions on the device. Inversely, if you’re looking to run hundred-billion parameter versions of models, the centralized cloud is going to be better suited for your workload.

The question then becomes: how do you navigate between these locations smoothly?

Since our initial release of Constellation (now called Workers AI), one technology we were particularly excited by was the ONNX runtime. The ONNX runtime creates a standardized environment for running models, which makes it possible to run various models across different locations.

We already talked about the edge as a great place for running inference itself, but it’s also great as a routing layer to help guide workloads smoothly across all three locations, based on the use case, and what you’re looking to optimize for — be it latency, accuracy, cost, compliance, or privacy.

Partnering with Hugging Face to provide optimized models at your fingertips

There’s nothing of course that can help developers go faster than meeting them where they are, so we are partnering with Hugging Face to bring serverless inference to available models, right where developers explore them.

Partnering with Databricks to make AI models

Together with Databricks, we will be bringing the power of MLflow to data scientists and engineers. MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, and this partnership will make it easier for users to deploy and manage ML models at scale. With this partnership, developers building on Cloudflare Workers AI will be able to leverage MLFlow compatible models for easy deployment into Cloudflare’s global network. Developers can use MLflow to efficiently package, implement, deploy and track a model directly into Cloudflare’s serverless developer platform.

AI that doesn’t keep your CIO or CFO or General Counsel up at night

Things are moving quickly in AI, and it’s important to give developers the tools they need to get moving, but it’s hard to move fast when there are important considerations to worry about. What about compliance, costs, privacy?

Compliance-friendly AI

Much as most of us would prefer not to think about it, AI and data residency are becoming increasingly regulated by governments. With governments requiring that data be processed locally or that their residents’ data be stored in-country, businesses have to think about that in the context of where inference workloads run as well. While with regard to latency, the network edge provides the ability to go as wide as possible. When it comes to compliance, the power of a network that spans 300 cities, and an offering like our Data Localization Suite, we enable the granularity required to keep AI deployments local.

Budget-friendly AI

Talking to many of our friends and colleagues experimenting with AI, one sentiment seems to resonate — AI is expensive. It’s easy to let costs get away before even getting anything into production or realizing value from it. Our intent with our AI platform is to make costs affordable, but perhaps more importantly, only charge you for what you use. Whether you’re using Workers AI directly, or our AI gateway, we want to provide the visibility and tools necessary to prevent AI spend from running away from you.

Privacy-friendly AI

If you’re putting AI front and center of your customer experiences and business operations, you want to be reassured that any data that runs through it is in safe hands. As has always been the case with Cloudflare, we’re taking a privacy-first approach. We can assure our customers that   we will not use any customer data passing through Cloudflare for inference to train large language models.

No, but really — we’re just getting started

We’re just getting started with AI, folks, and boy, are we in for a wild ride! As we continue to unlock the benefits of this technology, we can’t help but feel a sense of awe and wonder at the endless possibilities that lie ahead. From revolutionizing healthcare to transforming the way we work, AI is poised to change the game in ways we never thought possible. So buckle up, folks, because the future of AI is looking brighter than ever – and we can’t wait to see what’s next!

This wrap up message may have been generated by AI, but the sentiment is genuine — this is just the beginning, and we can’t wait to see what you build.


Source: cloudflare.com

Translate this article

TAGGED: Cloudflare, Software
Andra Smith October 22, 2023 October 22, 2023
Share This Article
Facebook Twitter Reddit Telegram Email Copy Link Print

STAY CONECTED

24.8k Followers Like
253.9k Followers Follow
33.7k Subscribers Subscribe
124.8k Members Follow

LAST 10 ALERT

Patchstack Becomes Member Of Open Source Security Foundation
Patchstack Becomes Member Of Open Source Security Foundation
Wordpress Threats 16 hours ago
PDF Phishing: Beyond the Bait
Threats 19 hours ago
A year in recap: Windows accessibility
Windows 19 hours ago
How to disable news feed from Widgets on Windows 11
News 20 hours ago
How to fix performance issues after upgrading to Windows 11 23H2
News 20 hours ago

You Might Also Like

Patchstack Becomes Member Of Open Source Security Foundation
Wordpress Threats

Patchstack Becomes Member Of Open Source Security Foundation

16 hours ago
Threats

Your Smart Coffee Maker is Brewing Up Trouble

4 days ago
Bigger, Better, Cooler in a 2U1N form factor
Apps

Bigger, Better, Cooler in a 2U1N form factor

4 days ago
Earn up to $10,000 for Vulnerabilities in WordPress Software
Wordpress Threats

Earn up to $10,000 for Vulnerabilities in WordPress Software

4 days ago
Show More

Related stories

Several Critical Vulnerabilities including Privilege Escalation, Authentication Bypass, and More Patched in UserPro WordPress Plugin
BridesMaid – neuron writes toasts For those very occasions when you need to give out a powerful
The other day Yandex pleased us with the announcement of a new Midi station – an excellent reason to listen
REMIX – remixes of pictures from neural networksCreate, share and correct works
How to download Diablo IV for free and absolutely legallyBlizzard has opened a free
Rostelecom employees were forced to abandon Android and iOS in favor of Aurora.
Previous Next

10 New Stories

Update ASAP! Critical Unauthenticated Arbitrary File Upload in MW WP Form Allows Malicious Code Execution
Fake CVE Phishing Campaign Tricks WordPress Users Into Installing Malware
How to disable updates on Windows 10 Pro and Home
How to stop, disable, and remove any Android apps — even system ones
Patchstack Alliance Bounty Program Events for December
Your Smart Coffee Maker is Brewing Up Trouble
Previous Next
Hot News
Patchstack Becomes Member Of Open Source Security Foundation
PDF Phishing: Beyond the Bait
A year in recap: Windows accessibility
How to disable news feed from Widgets on Windows 11
How to fix performance issues after upgrading to Windows 11 23H2
10alert.com10alert.com
Follow US
© 10 Alert Network. All Rights Reserved.
  • Privacy Policy
  • Contact
  • Customize Interests
  • My Bookmarks
  • Glossary
Go to mobile version
adbanner
AdBlock Detected
Our site is an advertising supported site. Please whitelist to support our site.
Okay, I'll Whitelist
Welcome Back!

Sign in to your account

Lost your password?