Meta’s Llama 4 is now available on Workers AI
2025-04-06
Llama 4 Scout 17B Instruct is now available on Workers AI: use this multimodal, Mixture of Experts AI model on Cloudflare's serverless AI platform to build next-gen AI applications....

\n
For an illustrative example, let’s say there’s an expert that’s really good at generating code while another expert is really good at creative writing. When a request comes in to write a Fibonacci algorithm in Haskell, the router sends the input tokens to the coding expert. This means that the remaining experts might remain unactivated, so the model only needs to use the smaller, specialized neural network to solve the problem.
In the case of Llama 4 Scout, this means the model is only using one expert (17B parameters) instead of the full 109B total parameters of the model. In reality, the model probably needs to use multiple experts to handle a request, but the point still stands: an MoE model architecture is incredibly efficient for the breadth of problems it can handle and the speed at which it can handle it.
MoE also makes it more efficient to train models. We recommend reading Meta’s blog post on how they trained the Llama 4 models. While more efficient to train, hosting an MoE model for inference can sometimes be more challenging. You need to load the full model weights (over 200 GB) into GPU memory. Supporting a larger context window also requires keeping more memory available in a Key Value cache.
Thankfully, Workers AI solves this by offering Llama 4 Scout as a serverless model, meaning that you don’t have to worry about things like infrastructure, hardware, memory, etc. — we do all of that for you, so you are only one API request away from interacting with Llama 4.
\nOne challenge in building AI-powered applications is the need to grab multiple different models, like a Large Language Model (LLM) and a visual model, to deliver a complete experience for the user. Llama 4 solves that problem by being natively multimodal, meaning the model can understand both text and images.
You might recall that Llama 3.2 11b was also a vision model, but Llama 3.2 actually used separate parameters for vision and text. This means that when you sent an image request to the model, it only used the vision parameters to understand the image.
With Llama 4, all the parameters natively understand both text and images. This allowed Meta to train the model parameters with large amounts of unlabeled text, image, and video data together. For the user, this means that you don’t have to chain together multiple models like a vision model and an LLM for a multimodal experience — you can do it all with Llama 4.
\nWe are excited to partner with Meta as a launch partner to make it effortless for developers to use Llama 4 in Cloudflare Workers AI. The release brings an efficient, multimodal, highly-capable and open-source model to anyone who wants to build AI-powered applications.
Cloudflare’s Developer Platform makes it possible to build complete applications that run alongside our Llama 4 inference. You can rely on our compute, storage, and agent layer running seamlessly with the inference from models like Llama 4. To learn more, head over to our developer docs model page for more information on using Llama 4 on Workers AI, including pricing, additional terms, and acceptable use policies.
Want to try it out without an account? Visit our AI playground or get started with building your AI experiences with Llama 4 and Workers AI.
"],"published_at":[0,"2025-04-06T02:22-01:00"],"updated_at":[0,"2025-04-06T05:32:57.387Z"],"feature_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4gaBb0TAh7CU75vO5TEGfQ/a4325c83fdbb4b480940a9a1bc459cc8/image1.png"],"tags":[1,[[0,{"id":[0,"2xCnBweKwOI3VXdYsGVbMe"],"name":[0,"Developer Week"],"slug":[0,"developer-week"]}],[0,{"id":[0,"4HIPcb68qM0e26fIxyfzwQ"],"name":[0,"Developers"],"slug":[0,"developers"]}],[0,{"id":[0,"1Wf1Dpb2AFicG44jpRT29y"],"name":[0,"Workers AI"],"slug":[0,"workers-ai"]}]]],"relatedTags":[0],"authors":[1,[[0,{"name":[0,"Michelle Chen"],"slug":[0,"michelle"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1hrcl3aVtUbBuCMeuXETWy/93dbfbc7d41c09ba35d863312dbde89d/michelle.jpg"],"location":[0,null],"website":[0,null],"twitter":[0,"@_mchenco"],"facebook":[0,null],"publiclyIndex":[0,true]}],[0,{"name":[0,"Jesse Kipp"],"slug":[0,"jesse"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/8yRqemLc8lLhE9Uwzd7G4/a303cca84c1840cc961831f1ea6b4213/jesse.jpg"],"location":[0,null],"website":[0,null],"twitter":[0,null],"facebook":[0,null],"publiclyIndex":[0,true]}],[0,{"name":[0,"Nikhil Kothari"],"slug":[0,"nikhil"],"bio":[0,"Director, Strategic Partnerships"],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7KZ3JdO5ODe3FLdN8ng1Nc/0dbaf23c66ad71fd5b338587ce057ae2/nikhil.jpeg"],"location":[0,"San Francisco"],"website":[0,null],"twitter":[0,null],"facebook":[0,null],"publiclyIndex":[0,true]}]]],"meta_description":[0,"Llama 4 Scout 17B Instruct is now available on Workers AI: use this multimodal, Mixture of Experts AI model on Cloudflare's serverless AI platform to build next-gen AI applications. Build products that can understand images & text, accept large context, with high performance on model quality and inference latency."],"primary_author":[0,{}],"localeList":[0,{"name":[0,"blog-english-only"],"enUS":[0,"English for Locale"],"zhCN":[0,"No Page for Locale"],"zhHansCN":[0,"No Page for Locale"],"zhTW":[0,"No Page for Locale"],"frFR":[0,"No Page for Locale"],"deDE":[0,"No Page for Locale"],"itIT":[0,"No Page for Locale"],"jaJP":[0,"No Page for Locale"],"koKR":[0,"No Page for Locale"],"ptBR":[0,"No Page for Locale"],"esLA":[0,"No Page for Locale"],"esES":[0,"No Page for Locale"],"enAU":[0,"No Page for Locale"],"enCA":[0,"No Page for Locale"],"enIN":[0,"No Page for Locale"],"enGB":[0,"No Page for Locale"],"idID":[0,"No Page for Locale"],"ruRU":[0,"No Page for Locale"],"svSE":[0,"No Page for Locale"],"viVN":[0,"No Page for Locale"],"plPL":[0,"No Page for Locale"],"arAR":[0,"No Page for Locale"],"nlNL":[0,"No Page for Locale"],"thTH":[0,"No Page for Locale"],"trTR":[0,"No Page for Locale"],"heIL":[0,"No Page for Locale"],"lvLV":[0,"No Page for Locale"],"etEE":[0,"No Page for Locale"],"ltLT":[0,"No Page for Locale"]}],"url":[0,"https://blog.cloudflare.com/meta-llama-4-is-now-available-on-workers-ai"],"metadata":[0,{"title":[0,"Meta Llama 4 now available on Workers AI"],"description":[0,"Llama 4 Scout 17B Instruct is now available on Workers AI: use this multimodal, Mixture of Experts AI model on Cloudflare's serverless AI platform to build next-gen AI applications. Build products that can understand images & text, accept large context, with high performance on model quality and inference latency."],"imgPreview":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5MBBMn6Jd4UO6WT0QZRjIB/02732115757b84b5c8dd9be0b144cd66/Meta_Llama_4_now_available_on_Workers_AI-OG.png"]}],"publicly_index":[0,true]}],"translations":[0,{"posts.by":[0,"By"],"footer.gdpr":[0,"GDPR"],"lang_blurb1":[0,"This post is also available in {lang1}."],"lang_blurb2":[0,"This post is also available in {lang1} and {lang2}."],"lang_blurb3":[0,"This post is also available in {lang1}, {lang2} and {lang3}."],"footer.press":[0,"Press"],"header.title":[0,"The Cloudflare Blog"],"search.clear":[0,"Clear"],"search.filter":[0,"Filter"],"search.source":[0,"Source"],"footer.careers":[0,"Careers"],"footer.company":[0,"Company"],"footer.support":[0,"Support"],"footer.the_net":[0,"theNet"],"search.filters":[0,"Filters"],"footer.our_team":[0,"Our team"],"footer.webinars":[0,"Webinars"],"page.more_posts":[0,"More posts"],"posts.time_read":[0,"{time} min read"],"search.language":[0,"Language"],"footer.community":[0,"Community"],"footer.resources":[0,"Resources"],"footer.solutions":[0,"Solutions"],"footer.trademark":[0,"Trademark"],"header.subscribe":[0,"Subscribe"],"footer.compliance":[0,"Compliance"],"footer.free_plans":[0,"Free plans"],"footer.impact_ESG":[0,"Impact/ESG"],"posts.follow_on_X":[0,"Follow on X"],"footer.help_center":[0,"Help center"],"footer.network_map":[0,"Network Map"],"header.please_wait":[0,"Please Wait"],"page.related_posts":[0,"Related posts"],"search.result_stat":[0,"Results {search_range} of {search_total} for {search_keyword}"],"footer.case_studies":[0,"Case Studies"],"footer.connect_2024":[0,"Connect 2024"],"footer.terms_of_use":[0,"Terms of Use"],"footer.white_papers":[0,"White Papers"],"footer.cloudflare_tv":[0,"Cloudflare TV"],"footer.community_hub":[0,"Community Hub"],"footer.compare_plans":[0,"Compare plans"],"footer.contact_sales":[0,"Contact Sales"],"header.contact_sales":[0,"Contact Sales"],"header.email_address":[0,"Email Address"],"page.error.not_found":[0,"Page not found"],"footer.developer_docs":[0,"Developer docs"],"footer.privacy_policy":[0,"Privacy Policy"],"footer.request_a_demo":[0,"Request a demo"],"page.continue_reading":[0,"Continue reading"],"footer.analysts_report":[0,"Analyst reports"],"footer.for_enterprises":[0,"For enterprises"],"footer.getting_started":[0,"Getting Started"],"footer.learning_center":[0,"Learning Center"],"footer.project_galileo":[0,"Project Galileo"],"pagination.newer_posts":[0,"Newer Posts"],"pagination.older_posts":[0,"Older Posts"],"posts.social_buttons.x":[0,"Discuss on X"],"search.icon_aria_label":[0,"Search"],"search.source_location":[0,"Source/Location"],"footer.about_cloudflare":[0,"About Cloudflare"],"footer.athenian_project":[0,"Athenian Project"],"footer.become_a_partner":[0,"Become a partner"],"footer.cloudflare_radar":[0,"Cloudflare Radar"],"footer.network_services":[0,"Network services"],"footer.trust_and_safety":[0,"Trust & Safety"],"header.get_started_free":[0,"Get Started Free"],"page.search.placeholder":[0,"Search Cloudflare"],"footer.cloudflare_status":[0,"Cloudflare Status"],"footer.cookie_preference":[0,"Cookie Preferences"],"header.valid_email_error":[0,"Must be valid email."],"search.result_stat_empty":[0,"Results {search_range} of {search_total}"],"footer.connectivity_cloud":[0,"Connectivity cloud"],"footer.developer_services":[0,"Developer services"],"footer.investor_relations":[0,"Investor relations"],"page.not_found.error_code":[0,"Error Code: 404"],"search.autocomplete_title":[0,"Insert a query. Press enter to send"],"footer.logos_and_press_kit":[0,"Logos & press kit"],"footer.application_services":[0,"Application services"],"footer.get_a_recommendation":[0,"Get a recommendation"],"posts.social_buttons.reddit":[0,"Discuss on Reddit"],"footer.sse_and_sase_services":[0,"SSE and SASE services"],"page.not_found.outdated_link":[0,"You may have used an outdated link, or you may have typed the address incorrectly."],"footer.report_security_issues":[0,"Report Security Issues"],"page.error.error_message_page":[0,"Sorry, we can't find the page you are looking for."],"header.subscribe_notifications":[0,"Subscribe to receive notifications of new posts:"],"footer.cloudflare_for_campaigns":[0,"Cloudflare for Campaigns"],"header.subscription_confimation":[0,"Subscription confirmed. Thank you for subscribing!"],"posts.social_buttons.hackernews":[0,"Discuss on Hacker News"],"footer.diversity_equity_inclusion":[0,"Diversity, equity & inclusion"],"footer.critical_infrastructure_defense_project":[0,"Critical Infrastructure Defense Project"]}]}" ssr="" client="load" opts="{"name":"PostCard","value":true}" await-children="">2025-04-06
Llama 4 Scout 17B Instruct is now available on Workers AI: use this multimodal, Mixture of Experts AI model on Cloudflare's serverless AI platform to build next-gen AI applications....
2025-03-21
Cloudflare’s Data Loss Prevention is reducing false positives by using a self-improving AI-powered algorithm, built on Cloudflare’s Developer Platform....
2025-03-20
Cloudflare’s first AI agent, Cloudy, helps make complicated configurations easy to understand for Cloudflare administrators....
2025-02-04
Today, we are launching a new dedicated “AI Insights” page on Cloudflare Radar that incorporates this graph and builds on it with additional metrics....
2024-12-24
How I used Workers AI to translate Cloudflare Stream’s auto-generated captions and what I learned along the way....
September 30, 2024 1:00 PM
Recapping all the big announcements made during 2024’s Birthday Week....
September 26, 2024 1:00 PM
Cloudflare helps you build AI applications with fast inference at the edge, optimized AI workflows, and vector database-powered RAG solutions....
July 23, 2024 3:15 PM
Cloudflare is excited to be a launch partner with Meta to introduce Workers AI support for Llama 3.1...
June 27, 2024 5:00 PM
Introducing a new way to do function calling in Workers AI by running function code alongside your inference. Plus, a new @cloudflare/ai-utils package to make getting started as simple as possible...
June 20, 2024 1:00 PM
With one click, users can now generate video captions effortlessly using Stream’s newest feature: AI-generated captions for on-demand videos and recordings of live streams...
May 22, 2024 1:00 PM
AI Gateway is an AI ops platform that provides speed, reliability, and observability for your AI applications. With a single line of code, you can unlock powerful features including rate limiting, custom caching, real-time logs, and aggregated analytics across multiple providers...
April 18, 2024 8:58 PM
We are thrilled to give developers around the world the ability to build AI applications with Meta Llama 3 using Workers AI. We are proud to be a launch partner with Meta for their newest 8B Llama 3 model...
April 02, 2024 1:01 PM
Today, we’re excited to make a series of announcements, including Workers AI, Cloudflare’s inference platform becoming GA and support for fine-tuned models with LoRAs and one-click deploys from HuggingFace. Cloudflare Workers now supports the Python programming language, and more...
April 02, 2024 1:00 PM
Workers AI now supports fine-tuned models using LoRAs. But what is a LoRA and how does it work? In this post, we dive into fine-tuning, LoRAs and even some math to share the details of how it all works under the hood...
March 14, 2024 12:30 PM
The Workers AI and AI Gateway team recently collaborated closely with security researchers at Ben Gurion University regarding a report submitted through our Public Bug Bounty program. Through this process, we discovered and fully ed a vulnerability affecting all LLM provider...
March 04, 2024 2:00 PM
Introducing AI Assistant for Security Analytics. Now it is easier than ever to get powerful insights about your web security. Use the new integrated natural language query interface to explore Security Analytics...
February 28, 2024 8:00 PM
In February 2024 we added 8 models for text generation, classification, and code generation use cases. Today, we’re back with 17 more models, focused on enabling new types of tasks and use cases ...
February 06, 2024 8:00 PM
Workers AI is now bigger and better with 8 new models and improved model performance...
December 06, 2023 2:00 PM
This is what Cloudflare has been able to do so far with OpenBMC with respect to our GPU-equipped servers...
November 23, 2023 2:00 PM
We're thrilled to announce that Stable Diffusion and Code Llama are now available as part of Workers AI, running in over 100 cities across Cloudflare’s global network....