November 15, 2023

Embracing the Future of Data Management with Microsoft Fabric: A Setup Guide



As data becomes larger and more complex, organizations are turning to medallion architecture for efficient data management. This method combines data processing, storage, reporting, and machine learning into one system, using a single data lake to integrate various data sources. Commonly, this is done using Azure Data Factory, Azure Data Lake Storage Gen2, Databricks, Azure Synapse, and Power BI.

Microsoft Fabric offers a new, comprehensive solution for businesses. It's a SaaS service that separates computing and storage, improving the medallion architecture with Data Fabric and Data Mesh patterns.


Advantages of Fabric

Direct store delivery (DSD) is a distribution method that allows manufacturers to deliver directly to stores, bypassing wholesalers entirely. While it offers better control over inventory and pricing, it also introduces challenges such as managing routes, drivers, vehicles, and orders in real time. To address these DSD challenges, the retailer needed a solution that could:

1. Streamlined operations and management: As a unified platform, Microsoft Fabric simplifies the handling of compute resources and storage.

2. Consolidated data source: Organizations often scatter data across multiple locations. Microsoft Fabric, with OneLake as the central storage and the 'shortcut' feature (allowing references to delta tables in external storages like ADLS Gen2/S3), ensures a unified data reference point.

3. Seamless system integration: It easily integrates with existing data storage systems, whether they're in Azure Data Lake Storage, Amazon S3, or SQL Server.

4. Simplified security: OneSecurity offers a simplified security and governance model.  



Adoption strategy

Microsoft Fabric, with its benefits, seems like the natural evolution of the medallion architecture. However, as Microsoft is just making Microsoft Fabric generally available, there are still features that need to be implemented to achieve feature parity with the existing solutions. Shifting to it immediately is a large task. We recommend a three-phase adoption:

1. Use Fabric in the Platinum layer for reporting.
2. Decouple the Gold and Platinum layers.
3. Transform your data analytics landscape.



Phase 1: Use Fabric in the Platinum layer for reporting

Power BI Premium users can easily upgrade to Microsoft Fabric, as it's included in the Premium SKU. The cost of Fabric SKU (F 64) may seem higher compared to its equivalent Power BI Premium P1 (at $4,995 per capacity/month). This is because Microsoft Fabric is available as a pay-as-you-go model, offering flexibility to adjust capacity based on demand. Reserved instance pricing for Fabric, which is expected to be a lower-cost option, has not been announced.


Figure 1: Microsoft Fabric pricing

In terms of usage, the reporting experience in Fabric is similar to Power BI, meaning existing features will work seamlessly. The first step is to set up a Lakehouse in Fabric, using shortcuts to data in the ADLS layer for Power BI reports.


Changes in Fabric compared to your existing Power BI setup

1. Tracking at CU level: Billing and tracking in Fabric are based on Computing Units (CUs). You can see all this information in the updated Capacity Metrics App, which displays all associated workspaces.

2. Improved workload management: Fabric offers elevated workload management. It allows your tasks to perform optimally by temporarily using more capacity when needed and smoothing out the workload distribution. This approach helps in planning based on average workload needs.

3. Capacity usage and autoscaling: You can use up to 10 minutes of future capacity without affecting operations. Additionally, there's an autoscale feature that automatically increases capacity during high-demand periods. The table below outlines this in more detail.



Future smoothed consumption

Platform policy Impact
 Usage ≤ 10 minutes  Overage protection    Jobs can consume 10 minutes of future capacity use without throttling.    
 10 minutes < Usage ≤ 60 minutes      Interactive delay      User-requested interactive-type jobs will be throttled.
 60 minutes < Usage ≤ 24 hours  Interactive rejection    User-requested interactive-type jobs will be rejected.
 Usage > 24 hours  Background rejection    User-scheduled background jobs will be rejected from execution.



Phase 2: Decouple the Gold and Platinum layer

The next step is to separate the Platinum and Gold layers. This means adding team members with development skills to the reporting team. This will give them full control over the Platinum/semantic layer and reduce dependency on the engineering team for data formatting.

In this setup, Fabric is primarily used for reporting and the semantic layer.

Figure 2: Architecture flow diagram

Direct Lake reports

Direct Lake is a new dataset type in Microsoft Fabric for Power BI reports, adding to the existing DirectQuery and Import models. It combines the best of both: the efficiency of Import mode and the ability to handle large data like DirectQuery. Direct Lake datasets access data in OneLake directly, eliminating the need for converting to other query languages and bypassing the need for manual dataset refreshes. This makes it perfect for querying large datasets that are frequently updated.

For optimal performance, it's recommended to write data to OneLake with VORDER enabled. This method, different from ZORDER, streamlines querying in OneLake by removing the need for language translation. Since there is no need to explicitly refresh the dataset, Direct Lake mode is ideal for handling large, regularly updated datasets. 

spark.conf.set("spark.sql.parquet.vorder.enabled", "true")

df.write.format("delta").saveAsTable("TableName")

Figure 3: Saving delta tables with VORDER enabled

When using Direct Lake datasets in Microsoft Fabric, keep these points in mind:
1. Be aware of the limitations of Direct Lake.
2. Sometimes, Direct Lake datasets might switch to DirectQuery mode, resulting in slower performance.

Granting engineering teams access to Fabric resources in a development setting will help them become familiar with its data integration, engineering, and real-time capabilities.



Phase 3: Transforming your data analytics landscape

The next step is to move your current processing and ingestion layers to Microsoft Fabric. However, we suggest waiting for this step until Fabric items like APIs, security, governance, CI/CD / git integration reach maturity. 

When migrating, create separate workspaces for the bronze, silver, and gold layers. If needed, divide the silver layer further into workspaces for different areas like Sales, HR, Finance, etc. Use OneSecurity for table and column level security to ensure access restrictions are maintained across all resources using the data.


Figure 4: Architecture diagram


Fabric setup

Start by forming a gold layer project team to create a strategy that fits your organization. Be aware of some limitations in Microsoft Fabric, like invoking pipelines across workspaces or defining identity columns, which are common in existing systems. While a one-size-fits-all guideline is challenging, the following recommendations can help structure your Fabric workloads with the current features. We will be revisiting these guidelines as new features become available.

1. Create 3 capacities: Allocate separate capacities for development + testing, pre-production, and production. This helps in testing production-level workloads in a controlled environment.

2. Organize workspaces for data categories: Establish common bronze and silver workspaces for each data type. For large enterprises, consider separating bronze and silver from gold workloads to prevent resource throttling issues in the gold layer.

3. Manage access:

a. Assign Pro licenses to developers and add them to the Contributor role in Azure Active Directory (AAD) groups.

b. Grant access to bronze and silver workspaces for the core team, and gold workspace access to gold project teams.

c. Add workspace admin groups to the admin role.

d. Create AAD groups per subject area, assigning gold development teams accordingly for access to relevant bronze and silver objects. Implement object/row/column level security using SQL server endpoints.

4. Implement version control: Set up an Azure repository for versioning Fabric solutions.

5. Establish deployment pipelines: Use these for promoting solutions to higher environments.

6. Use capacity metrics app: Install this app for admins and share it with development groups for visibility into capacity usage and workload impact.

7. Create domains for data mesh: Group project workspaces into domains like sales, marketing, etc., and manage access within these domains. Assign domain admins and contributors and allocate workspaces to each domain.



Figure 5: Creating a new domain 

8. Access for gold layer users: Each project team should handle user access through a workspace, using an Azure Active Directory (AAD) group with read-only permissions. This access covers both SQL and Power BI reports. For SQL endpoints, manage permissions with SQL GRANT and DENY commands.

9. Integrate Purview for protection: Set up Microsoft Purview integration in Fabric for information protection and data loss prevention. Set protection labels in the Purview portal and choose between mandatory or programmatic labeling for Power BI reports. Data loss prevention policies, currently applicable only to datasets, should be defined in the Microsoft Purview Compliance portal.

10. Use Purview Hub Report: This report, available to capacity admins, provides an overview of all items across workspaces, including resource promotions/certifications and sensitivity labels. Share it with the data stewards group to ensure adherence to organizational processes.


In summary

Microsoft Fabric excelled beyond addressing the retailer's core challenges by delivering transformative benefits to their Direct Store Delivery operations. By facilitating real-time data ingestion and processing, Fabric provided a unified, instantaneous view of all DSD activities, allowing for quick and strategic decision-making. 

The implementation also enabled real-time analytics and reporting capabilities, with Power BI integration ensuring that insights were both accessible and actionable. Fabric's scalable infrastructure laid the groundwork for predictive analytic applications, equipping the retailer with the tools to anticipate market trends and optimize their supply chain. On top of all this, costs were also optimized.

The result was a great shift in how the retailer operated, manifesting in increased efficiency, reduced costs, and a strengthened connection with customers.


Want to learn more?

When it comes to implementing Microsoft Fabric, you need a partner that you can trust to deliver the results you need. As a Fabric Featured Partner, our certified team has the deep expertise and experience you need to design, deploy, and manage a successful Microsoft Fabric environment. We offer a comprehensive suite of services, from planning and design to deployment and support, to help you get the most out of your investment in Microsoft Fabric. 

Contact Sales@MAQSoftware.com to learn more about how MAQ Software can help you achieve your business goals with Microsoft Fabric. Explore our Fabric services and Marketplace offerings today.