Machine Learning

Beyond the Flat Table: Building an Enterprise-Grade Financial Model in Power BI

there: You open Power BI, drag a messy Excel sheet into the canvas, and start dropping charts until something looks “right.” It’s easy, it’s intuitive, and honestly, that’s why Power BI is one of my favourite tools for data visualisation.

But as the world of data shifts toward end-to-end solutions like Microsoft Fabric, “just making it work” isn’t enough anymore. Large organisations need models that are performant, secure, and scalable.

I’ve decided to challenge myself by taking the PL-300: Microsoft Data Analyst Associate exam. But instead of just grinding through practice tests or memorising definitions, I’m going into “Practical Mode.” If I’m going to get certified, I want to prove I can solve the problems real businesses actually face.

The Mission: The Enterprise-Grade Financial Suite

For my first project, I’m tackling the Executive Financial Health Suite.

Why finance? Because in the enterprise world, it’s the ultimate test of your Data Modeling and DAX skills. Most “generic” tutorials use a single, flat table. But in a real company, data is fragmented. You have “Actuals” (what happened) sitting in one place and “Budgets” (the goal) sitting in another, usually at different levels of detail.

In this project, I’m going to document how I:

  • Deconstruct a “Flat Table” into a clean, professional Star Schema.
  • Handle the “Grain” Problem (comparing daily sales vs. monthly budgets).
  • Master DAX for those high-stakes metrics like Year-to-Date (YTD) and Variance %.

I’m sharing my journey in public so that if you’re also preparing for the PL-300, you can follow along, build these solutions with me, and understand the why behind the architecture — not just the how.
For this project, we are using the Microsoft Financial Sample. It’s the perfect “blank canvas” because it comes as a flat, “messy” table that we have to re-engineer professionally.

How to get it: In Power BI Desktop, go to Home > Sample Dataset > Load Sample Data. Select the financials table.

Let’s get our hands dirty in Power Query.

Phase 1: Data Transformation (Power Query)

Before touching DAX or visuals, I slowed myself down and spent real time in Power Query. This is the part I used to rush through. Now I treat it as the foundation of everything that follows.
If the data model is shaky, no amount of clever DAX will save you.

Step 1: Data Profiling (a quick reality check)

Once I loaded the Microsoft Financial Sample dataset, the first thing I did was turn on column profiling:

  • Column quality
  • Column distribution
  • Column profile

When I turn on Column quality, distribution, and profile, I’m not trying to be thorough for the sake of it. I’m scanning for model-breaking issues before they turn into DAX headaches.

Column profiling immediately tells you:

  • Where nulls are hiding
  • Which columns are pretending to be dimensions
  • Which fields look numeric but behave like text

1. Nulls & Data Type Mismatches

I think we’re good. Empty values are 0% all through, valid are 100%, and errors are 0%. Data types are all good, also. Probably because we’re using the sample financials dataset, there shouldn’t be any issues

2. Cardinality: What Wants to Be a Dimension

Cardinality is simply how many unique values a column has. Power BI surfaces this immediately in Column distribution, and once you start paying attention to it, modeling decisions get much easier.

Here’s my rule of thumb:

  • Low cardinality (values repeat a lot) → likely a dimension
  • High cardinality (values are mostly unique) → fact-level detail

When I turn on column distribution, I’m asking two questions:

  • How many distinct values does this column have?
  • Do those values repeat enough to be useful for filtering or grouping?

If a column looks categorical but has thousands of distinct values, that’s a red flag.

Once I turned on Column distribution, the dataset started sorting itself for me.

Some columns immediately showed low cardinality — they repeated often and behaved like true categories:

  • Segment
  • Country
  • Product
  • Discount Band
  • Manufacturing Price
  • Sales Price
  • Date attributes (Year, Month Number, Month Name)

These columns had relatively few distinct values and clear repetition across rows. That’s a strong signal: these want to be used for grouping, slicing, and relationships. In other words, they naturally belong on the dimension side of a star schema.

Then there were the columns on the other end of the spectrum.

Measures like:

  • Units Sold
  • Gross Sales
  • Discounts
  • Sales
  • COGS
  • Profit

…showed very high cardinality. Many values were unique or nearly unique per row, with wide numeric ranges. That’s exactly what I expect from fact-level metrics — they’re meant to be aggregated, not filtered on.

That insight directly informed my next step: using Reference in Power Query to spin off Dim_Product and Dim_Geography, instead of guessing or forcing the structure.

Step 2: Spinning Dimensions with Reference (Not Duplicate)

This is the point where I stopped treating the dataset as a report-ready table and started treating it as a model-in-progress.
In Power Query, it’s tempting to right-click a table and hit Duplicate. I used to do that all the time. It works — but it quietly creates problems you only feel later.

Instead, I used Reference.
Why reference instead of duplicate? You might ask

When you create a referenced table:

  • It inherits all upstream transformations
  • It stays logically tied to the source
  • Any fix in the fact table automatically flows downstream

From a real-world perspective, it’s just… safer.

Here’s how I created Dim_Product & Dim_Geography

Starting from the main financial table:

  • I right-clicked the query and selected Reference
  • Renamed the new query to Dim_Product
  • Kept only product-related columns (Product, Segment, Discount Band)
  • Removed duplicates
  • Ensured clean data types and naming

What I ended up with was a small, stable table with low cardinality — perfect for slicing and grouping.

I repeated the same approach for geography:

  • Reference the fact table
  • Keep the Country column
  • Remove duplicates
  • Clean text values

P.S. In this dataset, geography is represented only at the country level. Rather than forcing a region or city hierarchy that doesn’t exist, I modeled Country as a lean, single-column dimension.

Step 3: Create a Dynamic Date Table

Here’s where I see a lot of Power BI models quietly fail PL-300 standards.

  • I didn’t import a static calendar.
  • I didn’t manually generate dates.
  • I built a dynamic date table in Power Query based on the data itself.

Why this matters:

  • It guarantees no missing dates
  • It automatically adjusts when new data arrives
  • It aligns perfectly with Microsoft’s modeling best practices

To create a dynamic date table. Just click on Load -> Blank Query -> Advanced Editor and paste this code in

Below is the exact M code I used

let
Source = Financials,
MinDate = Date.From(List.Min(Source[Date])),
MaxDate = Date.From(List.Max(Source[Date])),
DateList = List.Dates(
MinDate,
Duration.Days(MaxDate — MinDate) + 1,
#duration(1, 0, 0, 0)
),
DateTable = Table.FromList(DateList, Splitter.SplitByNothing(), {“Date”}),
AddYear = Table.AddColumn(DateTable, “Year”, each Date.Year([Date]), Int64.Type),
AddMonthNum = Table.AddColumn(AddYear, “Month Number”, each Date.Month([Date]), Int64.Type),
AddMonthName = Table.AddColumn(AddMonthNum, “Month Name”, each Date.MonthName([Date]), type text),
AddQuarter = Table.AddColumn(AddMonthName, “Quarter”, each “Q” & Number.ToText(Date.QuarterOfYear([Date])), type text),
AddDay = Table.AddColumn(AddQuarter, “Day”, each Date.Day([Date]), Int64.Type)
in
AddDay

This calendar:

  • Covers every date in the dataset
  • Scales automatically
  • Is ready for time intelligence the moment it hits the model

Once loaded, I marked it as a Date Table in the model view — non-negotiable for PL-300.

By the end of Phase 1, I had:

  • A clean fact table
  • Proper dimension tables created via Reference
  • A dynamic, gap-free date table
  • Transformations I could actually explain to another analyst

Nothing flashy yet — but this is the phase that makes everything after it easier, faster, and more reliable.

In the next section, I’ll move into data modeling and relationships, where this structure really starts paying dividends.

Phase 2: Data Modeling (From Tables to a Star Schema)

This is the phase where Power BI starts behaving like a semantic model.

By the time I switched to the Model view, I already had:

  • A clean fact table
  • Lean dimensions created via Reference
  • A dynamic, gap-free date table

Now the goal was simple: connect everything cleanly and intentionally.

Step 1: Establishing the Star Schema

I aimed for a classic star schema:

  • One central fact table (financial metrics)
  • Surrounding dimension tables (Dim_Date, Dim_Product, Dim_Geography)

Every relationship needed to answer three questions:

  • Which table is the “one” side?
  • Which table is the “many” side?
  • Does this relationship make sense at the grain of the data?

You might notice that I didn’t introduce surrogate keys for the fact or dimension tables. In this dataset, the natural keys — Country, Product, and Date — are stable, low-cardinality, and unambiguous. For this model, adding artificial IDs would increase complexity without improving clarity or performance.

Here’s how the overall model looks:

Step 2: Relationship Direction (Single, on Purpose)

All relationships were set to:

  • Many-to-one
  • Single direction, flowing from dimension → fact

For PL-300 and real-world models alike, single-direction filters are the default until there’s a strong reason not to use them.

Step 3: Date Table as the Anchor

The dynamic date table I created earlier now became the backbone of the model.

I:

  • Related Dim_Date[Date] to the fact table’s date column
  • Marked Dim_Date as the official Date Table
  • Hid the raw date column in the fact table

This does three important things:

  • Enables time intelligence
  • Prevents accidental use of the wrong date field
  • Forces consistency across measures

From here on out, every time-based calculation flows through this table — no exceptions.

Step 4: Hiding What Users Don’t Need

This is a small step with an outsized impact. PL-300 explicitly tests this idea that the model shouldn’t be just correct — it should be usable.

I hid:

  • Foreign keys (Date, Product, Country). If a column exists only to create relationships, it doesn’t need to appear in the Fields pane.
  • Raw numeric columns that should only be used in measures. After creating my DAX measures (e.g. Total Sales, Total Profit). I can go ahead and hide raw numeric columns (like Units Sold, Gross Sales, Discounts, Sales, COGS, Profit) from my fact table. These nudges users toward correct and consistent aggregations
  • Duplicate date attributes in the fact Table (Year, Month, Month Number). These already exist in the date table.

Step 5: Validating the Model (Before Writing DAX)

Before touching any measures, I did a quick sanity check:

  • Does slicing by Country behave correctly?
  • Do Product and Segment filter as expected?
  • Do dates aggregate cleanly by Year and Month?

If something breaks here, it’s a modeling issue — not a DAX issue.

To test this, I created a quick visual checking the Sum of Profit by Year. Here’s how it turned out:

So far so good! Now we can move on to creating our DAX measures.

Phase 3: DAX Measures & Variance Analysis (Where the Model Starts to Shine)

This is the phase where the work I’d done in Power Query and the model really started paying off. Honestly, it’s the first time in a while that writing DAX didn’t feel like fighting the table. The star schema made everything… predictable.

Step 1: Base Measures — the foundation of sanity

I resisted my old instinct to drag raw columns into visuals. Instead, I created explicit measures for everything I cared about:

Total Sales :=
SUM ( financials[ Sales])

Total Profit :=
SUM ( financials[Profit] )

Total Units Sold :=
SUM ( financials[Units Sold] )

Total COGS :=
SUM ( financials[COGS])

Step 2: Time intelligence without surprises

Because I already had a complete, properly marked date table, things like year-to-date or prior year comparisons were effortless.

Sales Year-to-Date

Sales YTD :=
TOTALYTD (
[Total Sales],
‘Dim_Date’[Date]
)

Sales Prior Year

Sales PY :=
CALCULATE (
[Total Sales],
SAMEPERIODLASTYEAR ( ‘Dim_Date’[Date] )
)

Step 3: Variance measures — turning numbers into insight

Once I had Actual vs Prior Period, I could calculate variance with almost no extra effort:

Sales YoY Variance :=
[Total Sales] — [Sales PY]

Sales YoY % :=
DIVIDE (
[Sales YoY Variance],
[Sales PY]
)

Same approach for month-over-month:

Sales PM :=
CALCULATE (
[Total Sales],
DATEADD ( 'Dim_Date'[Date], -1, MONTH )
)

Sales MoM Variance :=
[Total Sales] - [Sales PM]

Sales MoM % :=
DIVIDE (
[Sales MoM Variance],
[Sales PM]
)

Step 4: Why this actually feels “easy”

Here’s the honest part: writing DAX didn’t feel like the hardest thing. The hard part was everything that came before:

  • Cleaning the data
  • Profiling columns
  • Spinning out dimensions with Reference
  • Building a solid date table

By the time I got here, the DAX was just adding value instead of patching holes.

Good DAX isn’t clever — it’s predictable, trustworthy, and easy to explain.

Conclusion

The magic wasn’t in any single DAX formula — it was in how the model came together. By profiling the data early, understanding cardinality, and spinning dimensions with Reference, I built a structure that just works. A dynamic date table and clean relationships meant time intelligence measures and variance calculations ran effortlessly.

Hiding unnecessary fields and grouping measures thoughtfully made the model approachable, even for someone else exploring it for the first time. By the time I wrote the DAX for Actual vs. Prior Period or Month-over-Month variance, everything felt predictable and trustworthy.

If you want to see the full semantic model in action, including all the tables, relationships, and measures, you can download it here and explore how it ties together. There’s no better way to learn than seeing a working model in Power BI and experimenting with it yourself.

Wanna connect? Feel free to say hi on any of the platforms below

Medium

LinkedIn

Twitter

YouTube

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button