optimize resources
26 TopicsUnderstanding the Total Cost of Ownership
Whether you're just beginning your journey in Azure or are already managing workloads in the cloud, it's essential to ground your strategy in proven guidance. The Microsoft Cloud Adoption Framework for Azure offers a comprehensive set of best practices, documentation, and tools to help you align your cloud adoption efforts with business goals. One of the foundational steps in this journey is understanding the financial implications of cloud migration. When evaluating the migration of workloads to Azure, calculating the Total Cost of Ownership (TCO) is a crucial step. TCO is a comprehensive metric that includes all cost components over the life of the resource. A well-constructed TCO analysis can provide valuable insights that aid in decision-making and drive financial efficiencies. By understanding the comprehensive costs associated with moving to Azure, you can make informed choices that align with your business goals and budget. Here is a breakdown of the main elements that you need to build your own TCO: 1. Current infrastructure configuration: Servers: details about your existing servers, including the number of servers, their specifications (CPU, memory, storage), and operating systems. Databases: information about your current databases, such as the type, size, and any associated licensing costs. Storage: type and amount of storage you are currently using, including any redundancy or backup solutions. Network Traffic: Account for outbound network traffic and any associated costs. 2. Azure Environment Configuration: Virtual Machines (VMs): appropriate Azure VMs that match your current server specifications. This has to be based on CPU, memory, storage, and region. Storage Options: type of storage (e.g., Standard HDD, Premium SSD), access tiers, and redundancy options that align with your needs. Networking: networking components, including virtual networks, load balancers, and bandwidth requirements. 3. Operational Costs: Power and Cooling: Estimate the costs associated with power and cooling for your on-premises infrastructure. IT Labor: Include the costs of IT labor required to manage and maintain your current infrastructure. Software Licensing: Account for any software licensing costs that will be incurred in both the current and Azure environments. Once you have more clarity of these inputs you can complement your analysis with other tools depending on your needs. The Azure Pricing Calculator is well suited to providing granular cost estimation for different Azure services and products. However, if the intent is to estimate cost and savings during migrations, Azure Migrate business case feature should be the preferred approach as it will allow the user to perform detailed financial analysis (TCO/ROI) for the best path forward and assess readiness to move workloads to Azure with confidence. Understand your Azure costs The Azure pricing calculator is a free cost management tool that allows users to understand and estimate costs of Azure Services and products. It serves as the only unauthenticated experience that allows you to configure and budget the expected cost of deploying solutions in Azure The Azure pricing calculator is key for properly adopting Azure. Whether you are in a discovery phase and trying to figure out what to use, what offers to apply or in a post purchase phase where you are trying to optimize your environment and see your negotiated prices, the azure pricing calculator fulfills both new users and existing customers' needs. The Azure pricing calculator allows organizations to plan and forecast cloud expenses, evaluate different configurations and pricing models, and make informed decisions about service selection and deployment options. Decide, plan, and execute your migration to Azure Azure Migrateis Microsoft’s free platform for migrating to and modernizing in Azure. It provides capabilities for discovery, business case (TCO/ROI), assessments, planning and migration in a workload agnostic manner. Customers must have an Azure account and create a migration project within the Azure portal to get started. Azure Migrate supports various migration scenarios, including for VMware and Hyper-V virtual machines (VM), physical servers, databases, and web apps. The service offers accurate appliance based and manual discovery options, to cater to customer needs. The Azure Migrate process consists of three main phases: Decide, Plan, and Execute. In the Decide phase, organizations discover their IT estate through several supported methods and can get a dependency map for their applications to help collocate all resources belonging to an application. Using the data discovered, one can also estimate costs and savings through the business case (TCO/ROI) feature. In the Plan phase, customers can assess for readiness to migrate, get right-sized recommendations for targets in Azure and tools to use for their migration strategy (IaaS/PaaS). Users can also create a migration plan consisting of iterative “waves” where each wave has all dependent workloads for applications to be moved during a maintenance window. Finally, the Execute phase focuses on the actual migration of workloads to a test environment in Azure in a phased manner to ensure a non-disruptive and efficient transition to Azure. A crucial step in the Azure Migrate process is building a business case prior to the move, which helps organizations understand the value Azure can bring to their business. The business case capability highlights the total cost of ownership (TCO) with discounts and compares cost and savings between on-premises and Azure including end-of-support (EOS) Windows OS and SQL versions. It provides year-on-year cash flow analysis with resource utilization insights and identifies quick wins for migration and modernization with an emphasis on long-term cost savings by transitioning from a capital expenditure model to an operating expenditure model, paying only for what is used. Understanding the Total Cost of Ownership (TCO) is essential for making informed decisions when migrating workloads to Azure. By thoroughly evaluating all cost components, including infrastructure, operational, facilities, licensing and migration costs, organizations can optimize their cloud strategy and achieve financial efficiencies. Utilize tools like the Azure Pricing Calculator and Azure Migrate to gain comprehensive insights and ensure a smooth transition to the cloud.117Views0likes0CommentsHow to control your Azure costs with Governance and Azure Policy
Azure resources can be configured in many ways, including ways which affect their performance, security, reliability, available features and ultimately cost. The challenge is, all these resources and configurations are completely available to us by default. As long as someone has permission, they can create any resource and configuration they like. This implicit “anything goes” gives our technical teams the freedom to decide what’s best. Like a kid in a toy shop, they will naturally favour the biggest, fastest and coolest toys. The immediate risk of course, is building beyond business requirements. Too much SKU, too much resilience, too much performance and too high cost. Left unchecked, and we risk increasingly challenging and long-term issues: Over-delivering will quickly become the norm. Excessive resources configurations will become the habitual default in all environments. Teams will become mis-aligned from wider business requirements. Teams will become used to working in a frictionless environment, and challenge any restrictions. FinOps teams will be stuck in endless cost optimisation work. You may already be feeling the pain. Trapped in a cycle of repetitive, reactive cost optimisation work, seeing the same repeat offenders and looking for a way out. To break (or prevent) the cycle, a new approach is needed. We must switch priorities from detection and removal, to prevention and control. We must keep waste out. We must avoid over-provisioning. We can achieve this with governance. What is governance Governance is a collection of rules, processes and tools that control how an organization consumes IT resources. It ensures our teams deploy resources that align to certain business goals, like security, cost, resource management and compliance. Governance rules are like rules for a boardgame. They define how the game should be played, no matter who is playing the game. This is important. It aligns everyone to our organization's rules regardless of role, position, seniority and authority. It helps ensures people play by the rules rather than their rules. Try playing Monopoly with no rules. What’s going to happen? I will pass go, and I will collect 200 dollars. For Microsoft Azure, and the cloud in general, governance is centered around controlling how resources can and cannot be configured. Storage Accounts should be configured like this. Virtual Machines must be configured like that. Disks can’t be configured with this. It's as much about keeping wrong configurations out, as the right configurations in. When we enforce configurations that meet our goals and restrict those that don’t, we drastically increase our chance of success. Why governance matters for FinOps Almost all over-provisioning and waste can be traced back to how a resource is configured. From SKU, to size, redundancy and additional features, if it’s not needed it’s being wasted. That’s all over-provisioning and waste is; Resources, properties and values that we don’t need. Too much SKU, like Premium Disks Standard HDD/SSD. Too much redundancy, like Storage Accounts with GRS when LRS is fine. Too many features, like App Gateways with WAF but it’s disabled. Have a think for a moment. What over-provisioning have you seen in the past? Was it one or two resource properties causing the problems? Whatever you’ve seen, with governance we can stop it happening again. When we control how resources get configured, we can control over-provisioning and waste, too. We can determine configurations we don’t need through our optimization efforts, and then create rules that define the configurations we do need: “We don’t need Premium SSD disks.” becomes “Disks must be Standard HDD/SSD.” “We don’t need Storage Accounts with GRS.” becomes “Storage Accounts must use LRS.” “We don’t need WAF enabled Application Gateways” becomes “Application Gateways should be Standard SKUs” These rules effectively remove the option to build beyond requirements. They will help teams avoid building too much/too big, stay within their means, hold them a bit more accountable and protect us from future overspend. Detection becomes Prevention. Removal becomes Control. Over time, we will: Help our teams deliver just enough. Raise and improve awareness of over-configurations and waste. Help keep waste out once it’s found. Reduce the chances over-provisioning in future. Steadily reduce the need for ongoing Cost Optimisation efforts. Free up time for other FinOps stuff. This is why governance is a natural evolution from cost optimization, and why it’s critical for FinOps teams who want to be more proactive and spend less time cleaning up after tech teams. How can we natively govern Microsoft Azure? In Microsoft Azure, we can use the native governance service Azure Policy to help control our environments. We can embed our governance rules into Azure itself and have Azure Policy do the heavy lifting of checking, reporting and enforcing. Azure Policy has many useful features: Supports over 74000 resource properties, including all that generate costs. Can audit resources, deny deployments and even auto-resolve resources as they come into Azure. Provides easy reporting of compliance issues, saving time on manual checks. Checks every deployment from every source. From Portal to Terraform, it’s got you covered. Supports different scopes from Resource Groups to Management Groups, allowing policies to be used at any scale. Supports parameters, making policies re-usable and quick to modify when responding to change in requirements. Exemptions can be used on resources we want to ignore for now. Supports different enforcement modes, for safe rollout of new policies. It comes at no additional cost. Free! These features make Azure Policy an extremely flexible and powerful tool that can help control resources, properties and values at any scale. We can: Create Policies for almost any cost-impacting value. SKUs, Redundancy Tiers, Instance Sizes, you name it… Use different effects based on how ‘strict’ the rule should be. For example, we can use Deny (resource creation) for resource missing “Must have” attributes, and Audit to check if resources are still compliant with “Should have” attributes. Use a combination of effects, enforcement modes and exemptions to control the rollout of new policies. Reuse the Policies on multiple environments (like development versus production), with different values and effects depending on the environment's needs. Quickly change the values when needed. When requirements change, the parameters can be modified with little effort. How to avoid unwanted Friction A common concern with governance is that it will create friction, interrupt work and slow teams down. This is a valid concern, and Azure Policy’s features allow for a controlled and safe rollout. With a good plan there is no need to worry. Consider the following: Start with Audit-Only policies and non-production environments. Start with simpler resources and regular/top offenders. Test policies in sandboxes before using them in live environments. Use the ‘Do not Enforce’ mode when first assigning Deny policies. This treats them as Audit-only, allowing review before being enforced. Always parameterize Effects and Values, for quick modification when needed. Use exemptions when there are sensitive resources that are best to ignore for now. Work with your teams and agree to a fair and balanced approach. Governance is for everyone and should include everyone where possible. The biggest challenge of all may be breaking habits formed over years of freedom in the Cloud. It’s natural to resist change, especially when it takes away our freedom. Remember, it’s friction where it’s needed, Interuption where it’s needed, slow down where it’s needed. They key to getting teams onboard is delivering the right message. Why are we doing this? How will they benefit? How does it help them? How could they be impacted if you do nothing? This needs to be more than “To meet our FinOps goals”. That’s your goal, not theirs. They won’t care. Try something like: We keep seeing over-utilization and waste and are spending an additional ‘X amount’ of time and money trying to remove it. This is now impacting our ability invest properly into our IT teams, affecting other departments and impacting our overall growth. If we can get over-spend reduced and under control, we can re-invest where you need it; tooling, people, training and anything else that makes your lives better. We want to implement governance rules and policies that will prevent issues reoccurring. With your insights and support we can achieve this faster, avoid unwanted impact, and can re-invest back into our IT teams once done. Sound good to you?! This is far more compelling and gives them reason to get onboard and help out. How to start your FinOps governance journey Making the jump from workload optimization into governance might initially sound challenging, but it’s actually pretty straightforward. Consider the typical workload optimization cycle: Discover potential misconfiguration, optimization and waste cleanup opportunities. Compare to actual business requirements. Optimize workload to meet those business requirements. A governance practice extends this to the following: Identify potential misconfigurations, optimization and waste cleanup opportunities. Compare to actual business requirements. Optimize workload to meet those business requirements. Create an Azure Policy based on how the resource should have originally been configured, and how it should remain in future. Thats it, one extra step. Most of the hard work has already happened in steps 1-3, in the workload optimization we’ve already been doing. Step 4 simply turns the optimization into rule that says “This resource must be like this from now on”, preventing it happening again. Let's do it again with a real resource, an Azure Disk: Identify Premium SSD Disks in non-production environment. Compare to business requirements, which confirms Standard HDD is fine. Change Disk SKU from Premium to Standard HDD. Create Azure Policy that only allows Disks with Standard HDD in the environment and denies other SKUs. Done. No more Premium SSDs in this environment again. Prevention and Control. The real work lies in being able to understand and identify how resources become over-provisioned and wasteful. Until then we will struggle to optimize, let alone govern. The Wasteful Eight There's so many resources and properties available. Understanding all the ways they can create waste can be challenging. Fortunately, we can group resource properties into eight main categories, which make our efforts a bit easier. Lets look at the Wasteful Eight: Category Examples Over-provisioned SKUs - Disks with Premium SSD instead of Standard HDD/SSD. - App Service Plans with Premium SKU, instead of Standard. - Azure Bastion with Premium SKU, instead of Developer. Too much redundancy - Storage Accounts configured with GRS, when LRS is fine. - Recovery Services Vaults with GRS, when LRS is fine. - SQL Databases with Zone Redundancy enabled. Too large / too many instances. - Azure VMs with too many CPUs. - SQL Databases with too many vCores/DTUs. - Disks which are over 1024GB. Supports auto-scaling/serverless, but aren’t using it. - Application Gateway doesn’t have auto-scaling enabled. - App Service Plans without Auto-Scaling. - SQL Databases using fixed provisioning, instead of Serverless or Elastic Pools Too many backups. - Backups that are too frequent. - Backups with too long retention periods. - Non-prod backups with similar retentions as Prod. Too much logging. - Logging enabled in non-prod. - Log retentions too long. - Logging to Log Analytics instead of Storage Accounts. - Log Analytics not using cheaper table plans. Extra features that are disabled, or not being used. - Application Gateway with WAF SKU, but the WAF is disabled. - Azure Firewall with Premium SKU, but IDPS is disabled. - Storage Accounts with SFTP enabled but not used. Orphaned/Unused. - Unattached Disks - Empty App Service Plans - Unattached NAT Gateways Remember, it's only wasteful if you don't have a business need for it, like too much redundancy in a non-production/development environment. In a production environment, you're likely to need premium disks or SKUs, GRS, and longer logging and backup retention periods. Governance is about reducing spend where you don't need it, and frees up money to spend where you do need it, for better redundancy, faster response times etc. All resources will fall somewhere in the above categories. A single resource can be found in most of them. For example, an Application Gateway can: Have an over-provisioned/unused SKU (WAF vs Standard). Have auto-scaling disabled. Have too many instances. Have excessive logging enabled. Have the WAF SKU, but the WAF is for some reason disabled. Be orphaned, by having no backend VMs. Breaking down any resource like this will uncover most of its cost-impacting properties and give us a good idea of what to focus on. A few outliers are inevitable, but the vast majority will be covered. Let's explore the Application Gateway examples further, the reasons why each item is wasteful and the subsequent Policies we might consider in a non-production environment. I’ve also included some links to respective Azure Policy definitions available in (test before use!). Discovery Reason Governance Rule/Policy Allowed Values and effects if applicable Application Gateway has WAF SKU but doesn’t need it. We use another firewall product. Allowed Application Gateway SKUs Standard Deny Application Gateway isn’t configured with Auto-Scaling, creating inefficient use of instances. Auto-Scaling improves efficiency by scaling up and down as demand changes. Manual scaling is inefficient. Application Gateway should be configured with Auto-Scaling. Deny Application Gateway min/max instance counts are higher than needed. Setting Min/Max instance thresholds avoids them being too high. Particularly the min count, which might not need more than 1 instance. Allowed App Gateway Min/Max instance counts Min Count: 1 Max Count: 2 Deny Non-Prod Application Gateways have logging enabled, when it’s not needed. We don't have usage that needs to be logged in non-production environments. Non-Prod Application Gateways should avoid logging Deny Application Gateway has WAF but it’s disabled. A disabled WAF is doing nothing yet still paid for. Either use it, or change the Tier to Standard to reduce costs. Application Gateway WAF is disabled. Audit Application Gateway has no Backend Pool resources. Indicates an orphaned/unused App Gateway. It should be removed. Application Gateway has empty Backend Pool and appears Orphaned Audit Now this might seem a bit over the top. Do we really to be controlling our App Gateway min/max scaling counts? It depends. If you have a genuine problem with too many instances then yes, you probably should. The point is, you can if you need to. This simply demonstrates how powerful governance and Azure Policy can be at controlling how resources are used. A more likely starting point will be things like SKUs, Sizes, Redundancy Tiers and Logging. These are the high risk, high impact areas you’ve probably seen before and want to avoid again. Once you exhaust those it's time to jump into Cost Management and explore your most expensive resources and services. Explore the Billing Meters to see how each resources costs are broken down. This is where your money is going and where your governance rules will have the biggest impact. Where to find Azure Policies If you want to use Azure policy you're going to need some Policy Definitions. A Definition is your governance rule defined in Azure. It tells Azure what configurations you do and don't want, and how to deal with problems. It's recommended that you start with some of the in-built policies first, before creating your own. These are provided by Microsoft, available inside Azure Policy to be applied, and are maintained by Microsoft. Fortunately, there are plenty of policies to choose from: built-in, community provided, Azure Landing Zone related and a few of my own: Azure Built-in Policy Repo: https://.com/Azure/azure-policy Azure Community Policy Repo: https://.com/Azure/Community-Policy Azure Landing Zones Policies: https://.com/Azure/Enterprise-Scale/blob/main/docs/wiki/ALZ-Policies.md My stuff: https://.com/aluckwell/Azure-Cost-Governance Making the search even easier is the AzAdvertizer. This handy tool brings thousands of policies into a single location, with easy search and filter functionality to help find useful ones. It even includes 'Deploy to Azure' links for quick deployment. AzAdvertizer: https://www.azadvertizer.net/azpolicyadvertizer_all.html Of the thousands of policies in AzAdvertizer, the list below is a great starting point for FinOps. These are all built-in, ready to go and will help you get familiar with how Azure Policy works: Policy Name Use Case Link Not Allowed Resource Types Block the creation of resources you don't need. Helps control when resource types can/can't be used. https://www.azadvertizer.net/azpolicyadvertizer/6c112d4e-5bc7-47ae-a041-ea2d9dccd749.html Allowed virtual machine size SKUs Allow the use of specific VM SKUs and Sizes and block SKUs that are too big or not fit for our use-case. https://www.azadvertizer.net/azpolicyadvertizer/cccc23c7-8427-4f53-ad12-b6a63eb452b3.html Allowed App Services Plan SKUs Allow the use of specific App Service Plan SKUs. Block SKUs that are too big or not fit for our use-case. https://www.azadvertizer.net/azpolicyadvertizer/27e36ba1-7f72-4a8e-b981-ef06d5c78c1a.html [Preview]: Do not allow creation of Recovery Services vaults of chosen storage redundancy. Avoid Recovery Services Vaults with too much redundancy. If you don't need GRS, block it. https://www.azadvertizer.net/azpolicyadvertizer/8f09fda1-91a2-4e14-96a2-67c6281158f7.html Storage accounts should be limited by allowed SKUs Avoid too much redundancy and performance when it's not needed. https://www.azadvertizer.net/azpolicyadvertizer/7433c107-6db4-4ad1-b57a-a76dce0154a1.html Configure Azure Defender for Servers to be disabled for resources (resource level) with the selected tag Disable Defender for Servers on Virtual Machines if they don't need it. Help control the rollout of Defender for Servers, avoiding machines that don't need it. https://www.azadvertizer.net/azpolicyadvertizer/080fedce-9d4a-4d07-abf0-9f036afbc9c8.html Unused App Service plans driving cost should be avoided Highlight when App Service Plans are 'Orphaned'. Either put them to use or get them deleted ASAP. https://www.azadvertizer.net/azpolicyadvertizer/Audit-ServerFarms-UnusedResourcesCostOptimization.html New policies are always being added, and existing policies improved (see the Versioning). Check back occasionally for changes and new additions that might be useful. When you get the itch to create your own, I'd suggest watching the following videos to understand the nuts and bolts of Azure Policy, and then onto Microsoft Learn for further reading. https://www.youtube.com/watch?v=4wGns611G4w https://www.youtube.com/watch?v=fhIn_kHz4hk https://learn.microsoft.com/azure/governance/policy/overview Good luck!546Views0likes0CommentsManaging Azure OpenAI costs with the FinOps toolkit and FOCUS: Turning tokens into unit economics
By Robb Dilallo Introduction As organizations rapidly adopt generative AI, Azure OpenAI usage is growing—and so are the complexities of managing its costs. Unlike traditional cloud services billed per compute hour or storage GB, Azure OpenAI charges based on token usage. For FinOps practitioners, this introduces a new frontier: understanding AI unit economics and managing costs where the consumed unit is a token. This article explains how to leverage the Microsoft FinOps toolkit and the FinOps Open Cost and Usage Specification (FOCUS) to gain visibility, allocate costs, and calculate unit economics for Azure OpenAI workloads. Why Azure OpenAI cost management is different AI services break many traditional cost management assumptions: Billed by token usage (input + output tokens). Model choices matter (e.g., GPT-3.5 vs. GPT-4 Turbo vs. GPT-4o). Prompt engineering impacts cost (longer context = more tokens). Bursty usage patterns complicate forecasting. Without proper visibility and unit cost tracking, it's difficult to optimize spend or align costs to business value. Step 1: Get visibility with the FinOps toolkit The Microsoft FinOps toolkit provides pre-built modules and patterns for analyzing Azure cost data. Key tools include: Microsoft Cost Management exports Export daily usage and cost data in a FOCUS-aligned format. FinOps hubs Infrastructure-as-Code solution to ingest, transform, and serve cost data. Power BI templates Pre-built reports conformed to FOCUS for easy analysis. Pro tip: Start by connecting your Microsoft Cost Management exports to a FinOps hub. Then, use the toolkit’s Power BI FOCUS templates to begin reporting. Learn more about the FinOps toolkit Step 2: Normalize data with FOCUS The FinOps Open Cost and Usage Specification (FOCUS) standardizes billing data across providers—including Azure OpenAI. FOCUS Column Purpose Azure Cost Management Field ServiceName Cloud service (e.g., Azure OpenAI Service) ServiceName ConsumedQuantity Number of tokens consumed Quantity PricingUnit Unit type, should align to "tokens" DistinctUnits BilledCost Actual cost billed CostInBillingCurrency ChargeCategory Identifies consumption vs. reservation ChargeType ResourceId Links to specific deployments or apps ResourceId Tags Maps usage to teams, projects, or environments Tags UsageType / Usage Details Further SKU-level detail Sku Meter Subcategory, Sku Meter Name Why it matters: Azure’s native billing schema can vary across services and time. FOCUS ensures consistency and enables cross-cloud comparisons. Tip: If you use custom deployment IDs or user metadata, apply them as tags to improve allocation and unit economics. Review the FOCUS specification Step 3: Calculate unit economics Unit cost per token = BilledCost ÷ ConsumedQuantity Real-world example: Calculating unit cost in Power BI A recent Power BI report breaks down Azure OpenAI usage by: SKU Meter Category → e.g., Azure OpenAI SKU Meter Subcategory → e.g., gpt 4o 0513 Input global Tokens SKU Meter Name → detailed SKU info (input/output, model version, etc.) GPT Model Usage Type Effective Cost gpt 4o 0513 Input global Tokens Input $292.77 gpt 4o 0513 Output global Tokens Output $23.40 Unit Cost Formula: Unit Cost = EffectiveCost ÷ ConsumedQuantity Power BI Measure Example: Unit Cost = SUM(EffectiveCost) / SUM(ConsumedQuantity) Pro tip: Break out input and output token costs by model version to: Track which workloads are driving spend. Benchmark cost per token across GPT models. Attribute costs back to teams or product features using Tags or ResourceId. Power BI tip: Building a GPT cost breakdown matrix To easily calculate token unit costs by GPT model and usage type, build a Matrix visual in Power BI using this hierarchy: Rows: SKU Meter Category SKU Meter Subcategory SKU Meter Name Values: EffectiveCost (sum) ConsumedQuantity (sum) Unit Cost (calculated measure) Unit Cost = SUM(‘Costs’[EffectiveCost]) / SUM(‘Costs’[ConsumedQuantity]) Hierarchy Example: Azure OpenAI ├── GPT 4o Input global Tokens ├── GPT 4o Output global Tokens ├── GPT 4.5 Input global Tokens └── etc. Power BI Matrix visual showing Azure OpenAI token usage and costs by SKU Meter Category, Subcategory, and Name. This breakdown enables calculation of unit cost per token across GPT models and usage types, supporting FinOps allocation and unit economics analysis. What you can see at the token level Metric Description Data Source Token Volume Total tokens consumed Consumed Quantity Effective Cost Actual billed cost BilledCost / Cost Unit Cost per Token Cost divided by token quantity Effective Unit Price SKU Category & Subcategory Model, version, and token type (input/output) Sku Meter Category, Subcategory, Meter Name Resource Group / Business Unit Logical or organizational grouping Resource Group, Business Unit Application Application or workload responsible for usage Application (tag) This visibility allows teams to: Benchmark cost efficiency across GPT models. Track token costs over time. Allocate AI costs to business units or features. Detect usage anomalies and optimize workload design. Tip: Apply consistent tagging (Cost Center, Application, Environment) to Azure OpenAI resources to enhance allocation and unit economics reporting. How the FinOps Foundation’s AI working group informs this approach The FinOps for AI overview, developed by the FinOps Foundation’s AI working group, highlights unique challenges in managing AI-related cloud costs, including: Complex cost drivers (tokens, models, compute hours, data transfer). Cross-functional collaboration between Finance, Engineering, and ML Ops teams. The importance of tracking AI unit economics to connect spend with value. By combining the FinOps toolkit, FOCUS-conformed data, and Power BI reporting, practitioners can implement many of the AI Working Group’s recommendations: Establish token-level unit cost metrics. Allocate costs to teams, models, and AI features. Detect cost anomalies specific to AI usage patterns. Improve forecasting accuracy despite AI workload variability. Tip: Applying consistent tagging to AI workloads (model version, environment, business unit, and experiment ID) significantly improves cost allocation and reporting maturity. Step 4: Allocate and Report Costs With FOCUS + FinOps toolkit: Allocate costs to teams, projects, or business units using Tags, ResourceId, or custom dimensions. Showback/Chargeback AI usage costs to stakeholders. Detect anomalies using the Toolkit’s patterns or integrate with Azure Monitor. Tagging tip: Add metadata to Azure OpenAI deployments for easier allocation and unit cost reporting. Example: tags: CostCenter: AI-Research Environment: Production Feature: Chatbot Step 5: Iterate Using FinOps Best Practices FinOps capability Relevance Reporting & analytics Visualize token costs and trends Allocation Assign costs to teams or workloads Unit economics Track cost per token or business output Forecasting Predict future AI costs Anomaly management Identify unexpected usage spikes Start small (Crawl), expand as you mature (Walk → Run). Learn about the FinOps Framework Next steps Ready to take control of your Azure OpenAI costs? Deploy the Microsoft FinOps toolkit Start ingesting and analyzing your Azure billing data. Get started Adopt FOCUS Normalize your cost data for clarity and cross-cloud consistency. Explore FOCUS Calculate AI unit economics Track token consumption and unit costs using Power BI. Customize Power BI reports Extend toolkit templates to include token-based unit economics. Join the conversation Share insights or questions with the FinOps community on TechCommunity or in the FinOps Foundation Slack. Advance Your Skills Consider the FinOps Certified FOCUS Analyst certification. Further Reading Managing the cost of AI: Understanding AI workload cost considerations Microsoft FinOps toolkit Learn about FOCUS Microsoft Cost Management + Billing FinOps Foundation Appendix: FOCUS column glossary ConsumedQuantity: The number of tokens or units consumed for a given SKU. This is the key measure of usage. ConsumedUnit: The type of unit being consumed, such as 'tokens', 'GB', or 'vCPU hours'. Often appears as 'Units' in Azure exports for OpenAI workloads. PricingUnit: The unit of measure used for pricing. Should match 'ConsumedUnit', e.g., 'tokens'. EffectiveCost: Final cost after amortization of reservations, discounts, and prepaid credits. Often derived from billing data. BilledCost: The invoiced charge before applying commitment discounts or amortization. PricingQuantity: The volume of usage after applying pricing rules such as tiered or block pricing. Used to calculate cost when multiplied by unit price.325Views2likes0CommentsConvert your Linux workloads while cutting costs with Azure Hybrid Benefit
As organizations increasingly adopt hybrid and cloud-first strategies to accelerate growth, managing costs is a top priority. Azure Hybrid Benefit provides discounts on Windows and SQL server licenses and subscriptions helping organizations reduce expenses during their migration to Azure. But did you know that Azure Hybrid Benefit also extends to Linux? In this blog, we’ll explore how Azure Hybrid Benefit for Linux enables enterprises to modernize their infrastructure, reduce cloud costs, and maintain seamless hybrid operations—all with the flexibility of easily converting their existing Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES) subscriptions. We’ll also dig into the differences in entitlements between organizations using Linux, Windows Server, and SQL licenses. Whether you’re migrating workloads or running a hybrid cloud environment, understanding this Azure offer can help you make the most of your subscription investments. Leverage your existing licenses while migrating to Azure Azure Hybrid Benefit for Linux allows organizations to leverage their existing RHEL or SLES licenses to migrate to in Azure, with a cost savings of up to 76% when combined with three-year Azure Reserved Instances. This offering provides significant advantages for businesses looking to migrate their Linux workloads to Azure or optimize their current Azure deployments: Seamless conversion: Existing pay-as-you-go Linux VMs can be converted to bring-your-own-subscription billing without downtime or redeployment Cost reduction: Organizations only pay for VM compute costs, eliminating software licensing fees for eligible Linux VMs Automatic maintenance: Microsoft handles image maintenance, updates, and es for converted RHEL and SLES images Unified management: It integrates with Azure CLI and provides the same user interface as other Azure VMs Simplified support: Organizations can receive co-located technical support from Azure, Red Hat, and SUSE with a single support ticket To use Azure Hybrid Benefit for Linux, customers must have eligible RedHat or SUSE subscriptions. For RHEL, customers need to enable their RedHat products for Cloud Access on Azure through RedHat Subscription Management before applying the benefit. Minimizing downtime and licensing costs with Azure Hybrid Benefit To illustrate the value of leveraging Azure Hybrid Benefit for Linux, let’s imagine a common use case with a hypothetical business. Contoso, a growing SaaS provider, initially deployed its application on Azure virtual machines (VMs) using a pay-as-you-go model. As demand for its platform increased, Contoso scaled its infrastructure, running a significant number of Linux-based VMs on Azure. With this growth, the company recognized an opportunity to optimize costs by negotiating a better Red Hat subscription directly with the vendor. Instead of restarting or migrating their workloads—an approach that could cause downtime and disrupt their customers' experience—Contoso leveraged Azure Hybrid Benefit for Linux VMs. This allowed them to seamlessly apply their existing Red Hat subscription to their Azure VMs without downtime, reducing licensing costs while maintaining operational stability. By using Azure Hybrid Benefit, Contoso successfully balanced cost savings and scalability while continuing to grow on Azure and provide continuous service to their customers. How does a Linux license differ from Windows or SQL? Entitlements for customers using Azure Hybrid Benefit for Linux are structured differently from those for Windows Server or SQL Server, primarily due to differences in licensing models, workload types, migration strategies, and support requirement. Azure Hybrid Benefit for Windows and SQL Azure Hybrid Benefit for Linux Azure Hybrid Benefit helps organizations reduce expenses during their migration to the cloud by providing discounts on SQL Server and Windows servers licenses with active Software Assurance. Additionally, they benefit from free extended security updates (ESUs) when migrating older Windows Server or SQL Server versions to Azure. Azure Hybrid Benefit for Windows and SQL customers typically manage traditional Windows-based workloads, including Active Directory, .NET applications, AKS, ADH, Azure Local, NC2, AVS, and enterprise databases, often migrating on-premises SQL Server databases to Azure SQL Managed Instance or Azure VMs. Windows and SQL customers frequently execute lift-and-shift migrations from on-premises Windows Server or SQL Server to Azure, often staying within the Microsoft stack. Azure Hybrid Benefit for Linux customers leverage their existing RHEL (Red Hat Enterprise Linux) or SLES (SUSE Linux Enterprise Server) subscriptions, benefiting from bring-your-own-subscription (BYOS) pricing rather than paying for Azure's on-demand rates. They typically work with enterprise Linux vendors for ongoing support. Azure Hybrid Benefit for Linux customers often run enterprise Linux workloads, such as SAP, Kubernetes-based applications, and custom enterprise applications, and are more likely to be DevOps-driven, leveraging containers, open-source tools, and automation frameworks. Linux users tend to adopt modern, cloud-native architectures, focusing on containers (AKS), microservices, and DevOps pipelines, while often implementing hybrid and multi-cloud strategies that integrate Azure with other major cloud providers. In conclusion, Azure Hybrid Benefit is a valuable offer for organizations looking to optimize their cloud strategy and manage costs effectively. By extending this benefit to Linux, Microsoft has opened new avenues for organization to modernize their infrastructure, reduce cloud expenses, and maintain seamless hybrid operations. With the ability to leverage existing RedHat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES) subscriptions, organizations can enjoy significant cost savings, seamless conversion of pay-as-you-go Linux VMs, automatic maintenance, unified management, and simplified support. Azure Hybrid Benefit for Linux not only provides flexibility and efficiency but also empowers organizations to make the most of their subscription investments while accelerating their growth in a hybrid and cloud-first world. Whether you're migrating workloads or running a hybrid cloud environment, understanding and utilizing this benefit can help you achieve your strategic goals with confidence. To learn more go to: Explore Azure Hybrid Benefit for Linux VMs - Azure Virtual Machines | Microsoft Learn164Views2likes0CommentsUnlock Cost Savings with Azure AI Foundry Provisioned Throughput reservations
In the ever-evolving world of artificial intelligence, businesses are constantly seeking ways to optimize their costs and streamline their operations while leveraging cutting-edge technologies. To help, Microsoft recently announced Azure AI Foundry Provisioned Throughput reservations which provide an innovative solution to achieve both. This offering is coming soon and will enable organizations to save significantly on their AI deployments by committing to specific throughput usage. Here’s a high-level look at what this offer is, how it works, and the benefits it brings. What are Azure AI Foundry Provisioned Throughput reservations? Prior to this announcement, Azure reservations could only apply to AI workloads running Azure OpenAI Service models. These Azure reservations were called “Azure OpenAI Service Provisioned reservations”. Now that more models are available on Azure AI Foundry and Azure reservations can apply to these models, Microsoft launched “Azure AI Foundry Provisioned Throughput reservations”. Azure AI Foundry Provisioned Throughput reservations is a strategic pricing offer for businesses using Provisioned Throughput Units (PTUs) to deploy AI models. Reservations enable businesses to reduce AI workload costs on predictable consumption patterns by locking in significant discounts compared to hourly pay-as-you-go pricing. How It Works The concept is simple yet powerful: instead of paying the PTU hourly rate for your AI model deployments, you pre-purchase a set quantity of PTUs for a specific term—either one month or one year in a specific region and deployment to receive a discounted price. The reservation applies to the deployment type (e.g., Global, Data Zone, or Regional*), and region. Azure AI Foundry Provisioned Throughput reservations are not model dependent, meaning that you do not have to commit to a model when purchasing. For example, if you deploy 3 Global PTUs in East US, you can purchase 3 Global PTU reservations in East US to significantly reduce your costs. It’s important to note that reservations are tied to deployment types and region, meaning a Global reservation won’t apply to Data Zone or Regional deployments and East US reservation won’t apply to West US deployments. Key Benefits Azure AI Foundry Provisioned Throughput reservations offer several benefits that make them an attractive option for organizations: Cost Savings: By committing to a reservation, businesses can save up to 70% compared to hourly pricing***. This makes it an ideal choice for production workloads, large-scale deployments, and steady usage patterns. Budget Control: Reservations are available for one-month or one-year terms, allowing organizations to align costs with their budget goals. Flexible terms ensure businesses can choose what works best for their financial planning. Streamlined Billing: The reservation discount applies automatically to matching deployments, simplifying cost management and ensuring predictable expenditures. How to Purchase a reservation Purchasing an Azure AI Foundry Provisioned Throughput reservation is straightforward: Sign in to the Azure Portal and navigate to the Reservations section. Select the scope you want the reservation to apply to (shared, management group, single subscription, single resource group) Select the deployment type (Global, Data Zone, or Regional) and the Azure region you want to cover. Specify the quantity of PTUs and the term (one month or one year). Add the reservation to your cart and complete the purchase. Reservations can be paid for upfront or through monthly payments, depending on your subscription type. The reservation begins immediately upon purchase and applies to any deployments matching the reservation's attributes. Best Practices Important: Azure reservations are NOT deployments —they are entirely related to billing. The Azure reservation itself doesn’t guarantee capacity, and capacity availability is very dynamic. To maximize the value of your reservation, follow these best practices: Deploy First: Create your deployments before purchasing a reservation to ensure you don’t overcommit to PTUs you may not use. Match Deployment Attributes: Ensure the scope, region, and deployment type of your reservation align with your actual deployments. Plan for Renewal: Reservations can be set to auto-renew, ensuring continuous cost savings without service interruptions. Monitor and manage: Post purchase of reservations it is important to regularly monitor your reservation utilization and setup budget alerts. Exchange reservations: Exchange your reservations if your workloads change throughout your term. Why Choose Azure AI Foundry Provisioned Throughput reservations? Azure AI Foundry Provisioned Throughput reservations are a perfect blend of cost efficiency and flexibility. Whether you’re deploying AI models for real-time processing, large-scale data transformations, or enterprise applications, this offering helps you reduce costs while maintaining high performance. By committing to a reservation, you can not only save money but also streamline your billing and gain better control over your AI expenses. Conclusion As businesses continue to adopt AI technologies, managing costs becomes a critical factor in ensuring scalability and success. Azure AI Foundry Provisioned Throughput reservations empower organizations to achieve their AI goals without breaking the bank. By aligning your workload requirements with this innovative offer, you can unlock significant savings while maintaining the flexibility and capabilities needed to drive innovation. Ready to get started? Learn more about Azure reservations and be on the lookout for Azure AI Foundry Provisioned Throughput reservations to be available to purchase in your Azure portal and get started with Additional Resources: What are Azure Reservations? - Microsoft Cost Management | Microsoft Learn Azure Pricing Overview | Microsoft Azure Azure Essentials | Microsoft Azure Azure AI Foundry | Microsoft Azure *Not all models will be available regionally. **Not all models will be available for Azure AI Foundry Provisioned Throughput reservations. *** The 70% savings is based on the GPT-4o Global provisioned throughput Azure hourly rate of approximately $1/hour, compared to the reduced rate of a 1-year Azure reservation at approximately $0.3027/hour. Azure pricing as of May 1, 2025 (prices subject to change. Actual savings may vary depending on the specific Large Language Model and region availability.)576Views2likes0CommentsMaximize efficiency by managing and exchanging your Azure OpenAI Service provisioned reservations
When it comes to AI, businesses confront unprecedented challenges in efficiently managing computational resources. That’s why Azure OpenAI Service is a critical platform for organizations seeking to leverage cutting-edge AI capabilities, and it makes provisioned reservations an essential strategy for intelligent cost savings. Business needs change, of course, and flexibility in managing these reservations is vital. In this blog, we’ll not only explore what makes Azure OpenAI Service provisioned reservations indispensable for organizations seeking resilience and cost efficiency in their AI operations, but also follow a fictional company, Contoso, to illustrate real-world scenarios where exchanging reservations enhances scalability and budget control. The crucial role of provisioned reservations in modern AI infrastructure Azure OpenAI Service provisioned reservations help organizations save money by committing to a month- or yearlong provisioned throughput unit reservation for AI model usage, ensuring guaranteed availability and predictable costs. As mentioned in this article, purchasing a reservation and choosing coverage for an Azure region, quantity, and deployment type, reduces costs as compared to being charged at hourly rates. Actively managing and monitoring these reservations is paramount to unlocking their full potential. Here's why: Optimizing utilization: Regular monitoring ensures that your reservations align with your actual usage, preventing wasted resources. Adapting to business changes: As business needs shift, reservations can be adjusted to accommodate evolving requirements. Avoiding over-commitment: Proactive management helps prevent over-purchasing reservations, which can lead to unnecessary expenses. Enhancing cost control and accountability: By tracking reservation usage and costs, organizations can maintain better control over their AI budgets. Leveraging AI usage insights: Analyzing reservation utilization provides valuable insights into AI application performance and usage patterns. The value of exchanging provisioned reservations One of the most powerful aspects of provisioned reservations is the ability to exchange them. This flexibility allows businesses to adapt their commitments to better align with their evolving needs. Exchanges can be initiated through the Azure Portal or via the Azure Reservation API, offering seamless adjustments. Consider Contoso, a global technology firm leveraging Azure OpenAI Service for customer support chatbots and content generation tools. Initially, Contoso’s needs were straightforward, but as their business expanded, their AI requirements changed. This is where the exchange feature proved invaluable. Types of provisioned reservations exchanges Contoso leveraged several types of exchanges to optimize their Azure OpenAI Service usage: Region exchange: Contoso initially committed to a reservation in the East US region. However, as their operations expanded into Western Europe, they needed to shift their AI workloads. By exchanging reservations, they were able to apply their discounted billing to the West Europe region, ensuring optimal performance for their growing user base. Deployment type exchange: There are three types of deployment: Global, Azure geography (or regional), and Microsoft specified data zone. Contoso initially reserved regional deployments for their inference operations, but because of growing demand they switched to global deployment. This means their Azure OpenAI Service prompts and responses will now be processed anywhere that the relevant model is deployed. By exchanging reservations from regional to global, they were able to apply their reservation savings ensuring seamless cost savings for their critical application. Term exchange: Contoso initially committed to a one-month reservation. However, they soon realized their need for ongoing service and wanted to allocate resources more efficiently. By exchanging reservations, they switched to a one-year term, allowing them to budget more effectively. Payment exchange: Contoso started with an upfront payment model. However, for better cash flow management, they transitioned to a monthly payment plan through a payment exchange. Changing the scope of provisioned reservations As Contoso’s use of Azure OpenAI Service expanded across multiple departments, they needed to modify their reservation scope. Azure offers the ability to scope reservations to individual resource groups or subscriptions, to subscriptions within a management group, or to all subscriptions within a billing account or billing profile. Contoso used Microsoft Cost Management to modify the scope of their reservations, ensuring that each department had the necessary resources. Setting up automatic renewals for provisioned reservations To prevent service disruptions and maintain budget predictability, Contoso enabled automatic renewal for their reservations. Automatic renewals offer several benefits: Continuous service: Ensures uninterrupted billing for Azure OpenAI Service. Budget predictability: Maintains consistent costs over time. Reduced administrative overhead: Eliminates the need for manual renewal processes. Enabling auto-renewal in the Azure Portal is a straightforward process, ensuring that Contoso’s AI operations continue uninterrupted. Reviewing the provisioned reservation utilization report Contoso’s finance and IT teams regularly review their provisioned reservation utilization report to ensure they are getting the best value from their investment. These reports, accessible through Azure Cost Management, provide insights into reservation usage and help identify areas for optimization. Analyzing utilization reports allows Contoso to: Identify underutilized resources. Adjust reservations to match actual usage. Optimize costs and improve efficiency. Setting up utilization alerts To proactively monitor their reservation usage, Contoso configured reservation utilization alerts in Microsoft Cost Management. These alerts notify them if usage drops below a set threshold, allowing them to take timely action. By setting up utilization alerts, Contoso can: Receive real-time notifications of usage changes. Adjust reservations to avoid waste. Maintain optimal resource utilization. Best practices for managing Azure OpenAI Service provisioned reservations Azure OpenAI Service provisioned reservations offer a powerful way to control costs, but proactive management is essential for maximizing their value. As we have seen, Contoso implemented several best practices to maximize the benefits of provisioned reservations: Regular usage monitoring: Continuously tracking usage to identify trends and optimize resource allocation. Strategic adjustments and exchanges: Adapting reservations to match evolving business needs. Implementing governance policies: Establishing clear policies for reservation management and usage. Automating alerts and reporting: Configuring alerts and reports to proactively monitor reservation usage. By leveraging the flexibility of reservation exchanges and implementing best practices, any business can optimize their AI investments and drive long-term efficiency. Embracing these strategies will empower your organization to fully capitalize on the transformative potential of Azure OpenAI Service. Find out more by completing the Azure OpenAI Service provisioned reservation learn module. Additional Resources: What are Azure reservations? Save costs with Microsoft Azure OpenAI Service provisioned reservations Azure OpenAI Service provisioned throughput units (PTU) onboarding Azure pricing overview177Views2likes0CommentsA guide to Azure Storage and Virtual Machines cost optimization
In this blog, we will explore various techniques to optimize costs for Azure storage accounts and virtual machines, while also highlighting some important caveats to keep in mind. Where do we begin? Before diving into specific cost optimization strategies, it's essential to understand where your money is being spent. Microsoft Cost Management is a powerful tool that provides comprehensive insights into your cloud spending. By leveraging this tool, you can identify cost drivers, monitor spending patterns, and make informed decisions to optimize your expenses. Let's take a closer look at the Cost Management graph below to see how we can break down and analyse our expenses. The graph above shows the daily costs incurred by different services over a period of time. The x-axis represents the dates, while the y-axis represents the cost in USD ($). Different colors represent various services such as Virtual Machines, SQL Databases, Storage, etc. By analyzing this data, you can identify which services are driving your costs and take appropriate actions to optimize them. This foundational step is crucial for any cost optimization strategy and ensures that you are making data-driven decisions to maximize your cloud investments. Optimizing costs for storage accounts When it comes to Azure storage accounts, understanding where most of your money is being spent is crucial for effective cost optimization. Azure offers various storage options, each with its own pricing model and use cases. By analyzing your storage usage and implementing cost-saving strategies, you can significantly reduce your expenses. Again, Azure Cost Management is our savior :) you can find the cost breakdown of a storage account per meter, which will give us a better view of where to start our optimization from. Also, breakdown view of different tiers Let's dive deeper... Major cost drivers Let's understand the major cost drivers and areas for optimization Storage tiers: Azure provides different storage tiers, including Hot, Cool, Cold, and Archive. The Hot tier for frequently accessed data and expensive. The Cool and Archive tiers are more cost-effective for infrequently accessed data. Storage transactions: Every operation on your storage account, such as read, write, and delete, incurs a transaction cost. High transaction volumes can lead to significant expenses. Data redundancy options: Azure offers several redundancy options, such as Locally Redundant Storage (LRS), Zone-Redundant Storage (ZRS), Geo-Redundant Storage (GRS), and Read-Access Geo-Redundant Storage (RA-GRS). Higher redundancy levels provide better data protection but come at a higher cost Managed disks: Azure managed disks are block-level storage volumes that are managed by Azure and used with Azure Virtual Machines (VMs). They are designed to provide high durability, availability, and scalability for your applications. Not using appropriate disk type will lead to higher cost Let's take a closer look at the optimization approach for each cost drivers including caveats and considerations. Storage tier recommendations: Hot to cold tier First analyse the read/write transactions & data storage utilization to understand the frequency of read/write transactions and the amount of data stored Leverage Cold tier to lower cost when there is less read and write operations with more data storage cost For instance, moving data storage from the hot tier, which costs $0.195 per GB for the first 50TB/month, to the cold tier, which costs $0.0045 per GB for the first 50TB/month, can result in significant cost savings. Specifically, this transition can save approximately 97.69% on storage costs In summary, less storage cost but more transaction costs Caveats & considerations: Hot to cold tier When you change the storage tier from hot to cold, azure charges for write operations. Specifically, the cost of tiering down from hot to cool can be estimated by the number of write operations required, which is calculated per 10,000 operations. You will need to pay the cost due to write operations when migrating their storage account from the hot tier to the cold tier. This is a one-time cost incurred during the migration process. After the initial migration, the ongoing storage costs in the cold tier should be lower compared to the hot tier, leading to long-term savings Benefits: The long-term savings will be realized over time as the daily storage costs in the cold tier are lower compared to the hot tier. Storage transactions recommendations: V2 to V1 Please be mindful the use of V1 storage account from V2 should be considered very carefully, though it's not an ideal approach, it can be a viable option to explore. Leverage V1 type when you pay more for transactions with less data storage cost This recommendation is particularly beneficial when transactions costs are a significant portion of the overall storage expenses This option can be leveraged when you don’t leverage most of V2 storage features, migrating to V1 would trigger significant saving In summary, less transaction cost but more data storage cost Caveats & considerations: V2 to V1 Transition from V2 to V1 is not a straightforward approach, it requires the creation of a new storage account with V1 type. As of today, you cannot create a V1 account from azure portal, this can be created via IAC (bicep, powershell, terraform...). Post the account creation, you must move all the historical data to the new account. Migrating historical data from existing accounts will incur one time data migration costs as per V1 write transaction cost. General-purpose v1 accounts do not provide access to Cool or Archive storage. Since V1 storage accounts do not support Lifecycle Management policies. To manage data retention, you must create a solution to archive the data that has not been accessed in a while. Benefits: The long-term savings will be realized over time as the daily transaction costs in v1 are lower compared to v2. Data redundancy recommendations: Choosing the right option It's crucial to balance the need for redundancy with budget constraints. While higher redundancy options like Geo-Redundant Storage (GRS) provide better data protection, they come at a higher cost. Higher redundancy options may introduce latency due to data replication across regions. This can impact the performance of applications that require low-latency access to data By carefully evaluating the redundancy requirements a substantial savings can be seen. Managed disks recommendations: Managed disks can significantly impact your overall storage costs if not planned and utilized properly. Premium disks are designed for high-performance and low-latency workloads. However, due to their higher cost compared to standard disks, it's essential to use them judiciously and ensure they are allocated based on actual performance needs Analyse the usage of premium disk on their virtual machines (VMs). Specifically, focus on VMs that are primarily used for read-only caching. Understand the performance requirements and whether the premium disks can be changed to standard disk as you don't need intensive write operation to perform Consider moving premium disks to standard disks in lower environments, such as development and testing environments, if possible. Consider using ephemeral disks for AKS node pools. Ephemeral disks are temporary storage that is directly attached to the VM and provides high performance at a lower cost. If you don't need data persistence for your stateless workloads, ephemeral disks would be right choice. Finally, identity & remove orphan disks – disks that are not attached to any VMs. These disks can incur unnecessary costs. Orphan disks can be identified by many ways you can have a script, and other option would be using an azure workbook that can be handy to find orphan resources. dolevshor/azure-orphan-resources: Centralize orphan resources in Azure environments Caveats & considerations: Premium to standard Premium disks offer higher IOPS (Input/Output Operations Per Second) and lower latency compared to standard disks. Moving to standard disks may result in reduced performance, which could affect applications that require high-speed data access Make sure to analyze your usage patterns and see if moving from premium to standard disks is the right choice for your specific use case Optimizing costs for Azure virtual machines Managing virtual machine costs effectively is crucial for profitability and sustainable growth. Over-provisioned resources often lead to unnecessary expenses, but optimizing resource usage and reducing unnecessary VMs can result in substantial savings and positive environmental impacts. Above image is a representation of different SKU used and the wasted cost on that SKU’s. Segregation of virtual machines are on SKU basis and each SKU cost is the sum of multiple VM’s in a particular SKU. Major cost drivers Let us understand the major cost drivers and areas for optimization Stock keeping unit (SKU): The selection of the right Stock Keeping Unit (SKU) is a critical factor in optimizing costs. Different SKU types serve various needs: B Series: Ideal for small, non-critical environments with low resource consumption. F Series: Suitable for CPU-intensive workloads requiring high computational power. E Series: Best for memory-intensive applications needing substantial RAM. D Series: A general-purpose SKU that is often the default choice but might not always be the most cost-efficient. Avoiding the default selection of D Series and choosing the appropriate SKU based on specific application requirements can significantly reduce costs and improve efficiency. This image is representing the CPU utilization of Different virtual machines under a particular Subscription. By analyzing the usage, you can define the appropriate SKU for your Virtual Machine. Recommendations: - Choosing correct SKU series: First, start monitoring CPU and memory utilization of virtual machines to identify the exact load on virtual machines. Create a workbook to monitor the utilization or use if there are any tools to monitor the CPU and memory utilization. if the workload is for a small environment and not critical then please use B series. if it is more CPU intensive then choose F series and if it is memory intensive then choose E series. Basis on the load and cost please choose the best SKU, also use the latest version of SKU. For memory intensive Virtual machines i.e. Standard_D16_v3 will cost around $1.504 per hour and moving it to E series i.e. Standard_E8_v5 will cost around $0.872 per hour may lead to save ~ 60 % of cost. Using the latest version of SKU will also lead to save cost i.e. Standard_D16_v3 will cost around $1.672 per hour and using the latest version i.e. Standard_D16_v5 will cost around $1.632 per hour. Databricks VM SKU recommendations Balance cost efficiency and performance when selecting Databricks VM SKUs. Monitor CPU and memory utilization to choose the appropriate series (B for low cost, F for CPU-intensive, E for memory-intensive tasks). Consider upgrading to the latest SKU versions for potential cost savings. Use automation on data bricks VM to save extensive cost. Use reserved instances for 1-3 years to save ~ 60 % cost. Reservations for virtual machines Reserving virtual machines can help optimize resource usage and reduce operational costs. Committing to a reservation plan allows businesses to take advantage of significant discounts compared to pay-as-you-go pricing. This approach is particularly beneficial for workloads with predictable and consistent usage patterns. Auto-shutdown of virtual machines Auto-shutdown of virtual machines saves energy costs and frees resources, enhancing both performance and sustainability. It also limits unauthorized access, aiding in compliance with data privacy regulations. Shutdown and Deallocation of Virtual Machines: - Virtual machines not used for long time can be deallocated to save the extra cost of resources/ components attached to the virtual machine. Virtual machines used for a particular time can be auto shutdown to save the unwanted charges. Orphaned disks (unattached disks) Orphaned Disks: Storage disks that remain unattached to any virtual machine (VM) within the cloud environment, leading to unnecessary costs and resource inefficiencies. Identification and Management Use tools like Azure workbooks to detect unattached disks. Use resource graph explorer query to get the details of orphaned disks. Implement policies and alerts for prompt notifications. Establish routine clean-up practices to either reattach or delete orphaned disks. Efficient management minimizes costs and optimizes resource utilization. Resource Graph queries Below are the resource graph queries that were particularly handy for the storage & VM findings. To find premium disk with read-only caching: resources | where type == "microsoft.compute/virtualmachines" | join kind=inner ( ResourceContainers | where type == "microsoft.resources/subscriptions" | project subscriptionId, subscriptionName = name ) on subscriptionId | mv-expand dataDisks = properties.storageProfile.dataDisks | extend osDisk = properties.storageProfile.osDisk | project resourceGroup, subscriptionName, location, diskName = dataDisks.name, caching = dataDisks.caching, diskType = "DataDisk", sku = dataDisks.managedDisk.storageAccountType | union ( resources | where type == "microsoft.compute/virtualmachines" | join kind=inner ( ResourceContainers | where type == "microsoft.resources/subscriptions" | project subscriptionId, subscriptionName = name ) on subscriptionId | project resourceGroup, subscriptionName, location, diskName = properties.storageProfile.osDisk.name, caching = properties.storageProfile.osDisk.caching, diskType = "OSDisk", sku = properties.storageProfile.osDisk.managedDisk.storageAccountType ) | where caching == "ReadOnly" To find Storage V2 account with Hot tier for cold migration: Resources | where type == "microsoft.storage/storageaccounts" | join kind=inner ( ResourceContainers | where type == "microsoft.resources/subscriptions" | project subscriptionId, subscriptionName = name ) on subscriptionId | project subscriptionName, name, kind, location, resourceGroup, sku.tier, Tier = properties.accessTier, replication = sku.name | where kind == 'StorageV2' and Tier == 'Hot' To find ephemeral & managed disks attached to VMSS (AKS Nodepool): resources | where type == "microsoft.compute/virtualmachinescalesets" | join kind=inner ( ResourceContainers | where type == "microsoft.resources/subscriptions" | project subscriptionId, subscriptionName = name ) on subscriptionId |project subscriptionId, subscriptionName, name, instancesCount = tolong(sku.capacity), location, sku = sku.name, diskSize = properties.virtualMachineProfile.storageProfile.osDisk.diskSizeGB, storageType = properties.virtualMachineProfile.storageProfile.osDisk.managedDisk.storageAccountType, caching = properties.virtualMachineProfile.storageProfile.osDisk.caching, diskType = iif(isnull(properties.virtualMachineProfile.storageProfile.osDisk.diffDiskSettings), "Managed", "Ephemeral") | where storageType contains "Premium" | where name contains "aks" To find the orphaned disks: Resources | where type == "microsoft.compute/disks" | join kind=inner ( ResourceContainers | where type == "microsoft.resources/subscriptions" | project subscriptionId, subscriptionName = name ) on subscriptionId | where isnull(properties.diskState) or properties.diskState == "Unattached" | project name, id, location, resourceGroup, properties, subscriptionId, subscriptionName To find the details of VM: Resources | where type == "microsoft.compute/virtualmachines" | join kind=inner ( ResourceContainers | where type == "microsoft.resources/subscriptions" | project subscriptionId, subscriptionName = name ) on subscriptionId | project vmName = name, vmId = id, subscriptionId, subscriptionName, location, resourceGroup, sku = properties.hardwareProfile.vmSize, osType = properties.storageProfile.osDisk.osType To find the VM details not containing Data bricks VM: Resources | where type == "microsoft.compute/virtualmachines" | join kind=inner ( ResourceContainers | where type == "microsoft.resources/subscriptions" | project subscriptionId, subscriptionName = name ) on subscriptionId | project vmName = name, vmId = id, subscriptionId, subscriptionName, location, resourceGroup, sku = properties.hardwareProfile.vmSize, osType = properties.storageProfile.osDisk.osType | where resourceGroup !contains "databricks" and resourceGroup !contains "adb" References: Azure workbook to find orphan resources - dolevshor/azure-orphan-resources: Centralize orphan resources in Azure environments Azure workbook to find the CPU utilization - Azure Workbooks overview - Azure Monitor | Microsoft Learn1.8KViews1like0CommentsHow to choose the right reserved instance in Azure
Are you considering committing to a reserved instance in Azure? Our guide will help you make an informed decision by providing insights on the benefits, flexibility, and limitations of reserved instances. Learn about the steps to evaluate and purchase a reserved instance, the tools and services to optimize and manage them, and the alternative options available. Make the most of your investment with our comprehensive guide to Azure Reserved Instances.7.6KViews1like22Comments