As data becomes larger and more complex, organizations are turning to medallion architecture for efficient data management. This method combines data processing, storage, reporting, and machine learning into one system, using a single data lake to integrate various data sources. Commonly, this is done using Azure Data Factory, Azure Data Lake Storage Gen2, Databricks, Azure Synapse, and Power BI.
Microsoft Fabric offers a new, comprehensive solution for businesses. It's a SaaS service that separates computing and storage, improving the medallion architecture with Data Fabric and Data Mesh patterns.
Advantages of Fabric
Direct store delivery (DSD) is a distribution method that allows manufacturers to deliver directly to stores, bypassing wholesalers entirely. While it offers better control over inventory and pricing, it also introduces challenges such as managing routes, drivers, vehicles, and orders in real time. To address these DSD challenges, the retailer needed a solution that:
1. Streamlined operations and management: As a unified platform, Microsoft Fabric simplifies the handling of compute resources and storage.
2. Consolidated data source: Organizations often scatter data across multiple locations. Microsoft Fabric, with OneLake as the central storage and the 'shortcut' feature (allowing references to delta tables in external storages like ADLS Gen2/S3), ensures a unified data reference point.
3. Seamless system integration: It easily integrates with existing data storage systems, whether they're in Azure Data Lake Storage, Amazon S3, or SQL Server.
4. Simplified security: OneSecurity offers a simplified security and governance model.
Adoption strategy
Microsoft Fabric, with its benefits, seems like the natural evolution of the medallion architecture. However, as Microsoft is just making Microsoft Fabric generally available, there are still features that need to be implemented to achieve feature parity with the existing solutions. Shifting to it immediately is a large task. We recommend a three-phase adoption:
2. Decouple the Gold and Platinum layers.
3. Transform your data analytics landscape.
Phase 1: Use Fabric in the Platinum layer for reporting
Power BI Premium users can easily upgrade to Microsoft Fabric, as it's included in the Premium SKU. The cost of Fabric SKU (F 64) may seem higher compared to its equivalent Power BI Premium P1 (at $4,995 per capacity/month). This is because Microsoft Fabric is available as a pay-as-you-go model, offering flexibility to adjust capacity based on demand. Reserved instance pricing for Fabric, which is expected to be a lower-cost option, has not been announced.
Figure 1: Microsoft Fabric pricing |
Future smoothed consumption |
Platform policy | Impact |
---|---|---|
Usage ≤ 10 minutes | Overage protection | Jobs can consume 10 minutes of future capacity use without throttling. |
10 minutes < Usage ≤ 60 minutes | Interactive delay | User-requested interactive-type jobs will be throttled. |
60 minutes < Usage ≤ 24 hours | Interactive rejection | User-requested interactive-type jobs will be rejected. |
Usage > 24 hours | Background rejection | User-scheduled background jobs will be rejected from execution. |
Phase 2: Decouple the Gold and Platinum layer
Figure 2: Architecture flow diagram |
spark.conf.set("spark.sql.parquet.vorder.enabled",
"true")
Phase 3: Transforming your data analytics landscape
Figure 4: Architecture diagram |
Fabric setup
Start by forming a gold layer project team to create a strategy that fits your organization. Be aware of some limitations in Microsoft Fabric, like invoking pipelines across workspaces or defining identity columns, which are common in existing systems. While a one-size-fits-all guideline is challenging, the following recommendations can help structure your Fabric workloads with the current features. We will be revisiting these guidelines as new features become available.
1. Create 3 capacities: Allocate separate capacities for development + testing, pre-production, and production. This helps in testing production-level workloads in a controlled environment.
2. Organize workspaces for data categories: Establish common bronze and silver workspaces for each data type. For large enterprises, consider separating bronze and silver from gold workloads to prevent resource throttling issues in the gold layer.
3. Manage access:
a. Assign Pro licenses to developers and add them to the Contributor role in Azure Active Directory (AAD) groups.
b. Grant access to bronze and silver workspaces for the core team, and gold workspace access to gold project teams.
c. Add workspace admin groups to the admin role.
d. Create AAD groups per subject area, assigning gold development teams accordingly for access to relevant bronze and silver objects. Implement object/row/column level security using SQL server endpoints.
4. Implement version control: Set up an Azure repository for versioning Fabric solutions.
5. Establish deployment pipelines: Use these for promoting solutions to higher environments.
6. Use capacity metrics app: Install this app for admins and share it with development groups for visibility into capacity usage and workload impact.
7. Create domains for data mesh: Group project workspaces into domains like sales, marketing, etc., and manage access within these domains. Assign domain admins and contributors and allocate workspaces to each domain.
|
9. Integrate Purview for protection: Set up Microsoft Purview integration in Fabric for information protection and data loss prevention. Set protection labels in the Purview portal and choose between mandatory or programmatic labeling for Power BI reports. Data loss prevention policies, currently applicable only to datasets, should be defined in the Microsoft Purview Compliance portal.
10. Use Purview Hub Report: This report, available to capacity admins, provides an overview of all items across workspaces, including resource promotions/certifications and sensitivity labels. Share it with the data stewards group to ensure adherence to organizational processes.
In summary
Microsoft Fabric excelled beyond addressing the retailer's core challenges by delivering transformative benefits to their Direct Store Delivery operations. By facilitating real-time data ingestion and processing, Fabric provided a unified, instantaneous view of all DSD activities, allowing for quick and strategic decision-making.
The implementation also enabled real-time analytics and reporting capabilities, with Power BI integration ensuring that insights were both accessible and actionable. Fabric's scalable infrastructure laid the groundwork for predictive analytic applications, equipping the retailer with the tools to anticipate market trends and optimize their supply chain. On top of all this, costs were also optimized.
The result was a great shift in how the retailer operated, manifesting in increased efficiency, reduced costs, and a strengthened connection with customers.