Programming protocols and techniques

Submitted by shevorne.desil… on Fri, 03/24/2023 - 21:05

Most applications and big data platforms available today support multiple programming languages with in-built scripting editors to make writing scripts and queries easier, especially for analysts who do not have previous experience with programming languages.  The type of programming language used in a big data analysis project would also depend on the type, source and structure of the raw data that needs to be processed.

In this topic, you will learn:

  • key protocols and techniques commonly applied to programming languages when working with big data
  • how to use DAX (Data Analysis Expressions) to perform big data analysis tasks in the Microsoft Power BI Desktop.

Let's begin.

Sub Topics
web codes on a computer laptop

DAX formulas are similar to those used in Excel tables. However, there are some key differences.

DAX formulas are often written as calculations and come in three forms.23

  • Calculated measures – used to perform aggregations (e.g. calculate totals, count distinct values etc.)
  • Calculated columns – used to add additional columns to tables in a dataset
  • Calculated tables – used to create a new table with required columns that can have aggregated values.

Let us understand the basic rules that you need to follow when writing DAX formulas and specific object naming requirements. We need to consider the structure or format for a DAX formula, then review some commonly used function and how to build statements. Finally, these can be assembled to create a query to help make sense of big data and build your analysis.

Basic rules

When writing DAX formulas for calculated columns or measures it is important to follow these general rules or protocols.

  • A DAX formula always starts with an equal (=) sign
  • After the equals sign, comes the expression, which may include elements such as statements, functions, operators and values. It is important to note that expressions:
    • are always read from left to right
    • use parentheses to control the order in which the elements are grouped.
  • A DAX function always references a complete column, measure or table. This is commonly known as the fully qualified name of that object. For example:
    • The fully qualified name of a column refers to the table name, followed by the column name in square brackets. For example, 'AUS Order'[Products].
    • The fully qualified name of a measure refers to the table name, followed by the measure name in square brackets. For example, 'AUS Order'[Profit].

Object naming requirements

  • All object names are case-insensitive. (e.g. The table names 'SALES' and 'Sales' would represent the same table name).
    • Note that the term object here refers to tables, columns and measures.
  • The names of objects must be unique within their current context. For example:
    • measure names must be unique within a model
    • column names must be unique within a table
    • table names must be unique within a database
  • Leading or trailing spaces that are enclosed by name delimiters, brackets, or single apostrophes are valid in the names of tables, columns, and measures.
  • If the name of a table contains any of the following, you must enclose the table name in single quotation marks.
    • Spaces (e.g. 'Order Detail')
    • Reserved keywords (e.g. 'Analysis Services' cannot be used as a table name unless it is enclosed in quotation marks)
    • Disallowed characters .,;':/\*|?&%$!+=()[]{}<>
    • Characters outside the ANSI alphanumeric character range

Following are some examples of object names used within DAX formulas.

Example Object type Comments on the naming requirement
Customer Table name This table name does not contain any spaces or other special characters. Therefore, the name does not need to be enclosed in quotation marks.
'AUS Orders' Table name This table name contains a space therefore it needs to be enclosed in single quotation marks.
'Discount%' Table name This table name contains a special character. Therefore it needs to be enclosed in single quotation marks.
Orders[Cost] Fully qualified column name The table name precedes the column name and the column name is enclosed in square brackets.
Orders[Profit] Fully qualified measure name The table name precedes the measure name and the measure name is enclosed in square brackets.
[Cost] Unqualified column name

This is just the column name in square brackets. This can be used in certain situations in formulas such as:

  • in calculated columns within the same table
  • in aggregation function that scans over the same table.
'AUS Order'[Cost] Fully qualified column in table with spaces The table name contains spaces, so it must be enclosed in single quotation marks.

Operators

There are different types of operators used when writing DAX formulas. Some of the frequently used DAX operators are listed in the following table.24

Operator type Metacharacter symbols Meaning
Arithmetic operators + Addition
- Subtraction
/ Division
* Multiplication
Comparison operators = Equal to
== Strict equal to
> Greater than
< Less than
<= Less than or equal to
>= Greater than or equal to
<> Not equal to
Text concatenation operator & Connects two values to produce one continuous text value
Logical operators && AND condition
|| OR condition
Parenthesis operator () Used for precedence order and grouping of arguments

For more information on DAX operators, refer to the DAX operators - DAX | Microsoft Docs.

Knowledge check

Complete the following seven (7) questions. Click the arrows to navigate between the questions.

Now that you are aware of the general rules for writing DAX formulas, let’s look into the structure and basic syntax of a DAX formula.

Man's hand on webcoding

Basic elements of a DAX statement

Before you can write your own DAX functions you need to be familiar with the various elements that make up a DAX formula. 25

The following represents the basic elements that make up a DAX statement.

Elements of a DAX statement

Simply, what this DAX statement does, is to create a measure named Total Costs that calculates (=) the SUM() of values in the CostAmount column in the Orders table.

Let’s understand each of the elements in this DAX measure.

  1. The name of the new measure (e.g. Total Costs)
  2. The DAX formula (e.g. begins with the equals sign operator and ends with close parenthesis)
  3. Marks the beginning of the formula is indicated by an equals sign operator (=)
  4. DAX function (e.g. SUM() – used to add all values in a given table’s column such as Orders[CostAmount])
  5. Parenthesis (), which surrounds an expression that contains one or more arguments. Most functions require at least one argument. An argument passes a value to a function.
  6. The name of the table that the formula applies to (e.g. Orders)
  7. The name of the column that the formula applies to (e.g. CostAmount)

The following examples demonstrate how the DAX formula syntax can be used to create a calculated column, measure and table. In each of these examples notice how the syntax rules are followed.

1 of 3
 
  1. DAX column/measure name (e.g. Profit, Total Profit, Customer)
  2. DAX Formula

Adding comments in DAX scripts

Adding comments will not affect the performance of the scripts/queries but would make it easier for others to understand the written code and make changes later on if needed.

Following are the different methods of commenting.

In-line comments

Method: Start your comment with two hyphens/dashes (--).

In the following DAX statement, notice the use of hyphens to embed the comment inline. This indicates that anything after the hyphens is the comment text.

Orders = SELECTCOLUMNS('Company Data', --this is an inline comment
  "Customer ID", [Customer ID],
  "Customer Name", [Customer Name],
  "Customer Address", [Customer Address]
)

Single-line comments

Method: Begin the comment with the // symbol.

In the following DAX statement, notice the use of the // symbol to comment the entire line of text.

Orders = SELECTCOLUMNS('Company Data',
  "Customer ID", [Customer ID],
  // this is a single-line comment
  "Customer Name", [Customer Name],
  "Customer Address", [Customer Address]
)

Commenting-out multiple lines of code

Method: Place the code in between /* and */ symbols.

In the following DAX statement, the specific two lines of the script won’t execute because it is commented out.

Orders = SELECTCOLUMNS('Company Data',
  "Customer ID", [Customer ID]
  /*
    this is a multi-line comment
    "Customer Name", [Customer Name],
    "Customer Address", [Customer Address]
  */
)

Complete the following activity to check your understanding of the DAX formula syntax and commenting methods.

Knowledge check

Complete the following three (3) questions. Click the arrows to navigate between the questions.

Man working on work desk with codes

DAX functions help to perform commonly used data calculations on data models, databases, tables etc.

Types of DAX functions

There are a variety of DAX functions that are built into the DAX language that can be used for various purposes when writing DAX formulas. It is important to note that each DAX function, has specific syntax rules that must be followed to ensure the script executes without any errors.

For more information on DAX functions and their specific syntax rules, refer to the DAX function reference - DAX | Microsoft Docs.

The following table explores and provides examples of some of the DAX functions, their purpose and how they can be used when writing queries.

Function type DAX functions What it does Example
Aggregation functions
SUM()
			
Used to add all the numbers that are in a column.

The following example adds all the numbers in the Cost column from the Sales table.

=SUM(Sales[Cost])
SUMX()
			
Returns the sum of an expression evaluated for each row in a table.

The following example shows the SUMX function taking the Sales table as its first argument. The second argument contains the expression that instructs to multiply the Unit Price and Discount% fields for each row of the Sales table. This ultimately returns the sum of the discounted unit prices. 

=SUMX(Sales,
 Sales[Unit Price] * Sales[Discount%]
 )
COUNT()
Counts the number of rows in the specified column that contain non-blank values.

The following example returns a whole number after counting the values in the Product ID column in the Product table

=COUNT(Product[Product ID])
DISTINCTCOUNT()
			
Used to return the number of distinct values in a given column.

The following example returns the calculated number of distinct values from the Category column from the Products table.

=DISTINCTCOUNT(Category)
AVERAGE()
			
Returns the average (arithmetic mean) of all the numbers in a column.

The following example returns the average of the Revenue column from the Sales table.

=AVERAGE(Sales[Revenue])
MAXA()
			
Returns the largest value in a column.

The following example returns the largest value of the Revenue column from the Sales table.

=MAXA(Sales[Revenue])
MINA()
			
Returns the smallest value in a column.

The following example returns the minimum value of the Revenue column from the Sales table.

=MINA(Sales[Revenue])
Math and trig functions
DIVIDE()

Used to perform divisions and returns an alternate result or BLANK() on division by 0.

Refer to: DIVIDE function vs divide operator (/) in DAX - DAX | Microsoft Learn to understand why this function is used instead of ‘/’ to perform divisions.

The following example returns 5.

=DIVIDE(10,2)

The following example returns a blank.

=DIVIDE(10,0)

The following example returns 1.

=DIVIDE(10,0,1)
Statistical functions
MEDIAN()
Returns the median of numbers in a column.

The following example returns the median value of the Revenue column from the Sales table.

=MEDIAN(Sales[Revenue])
STDEV.P()
Returns the standard deviation of the entire population.

The following example returns the standard deviation value of the Revenue column from the Sales table.

=STDEV.P(Sales[Revenue])
PERCENTILE.INC()

Returns the kth percentile of values in a range, where k is in the range 0-1 inclusive.

Refer to PERCENTILE.INC function (DAX) - DAX | Microsoft Learn for more details.

The following example returns the first percentile value of the Revenue column from the entire Sales table.

=PERCENTILE.INC(Sales[Revenue,0.25])

The following example returns the third percentile value of the Revenue column from the entire Sales table.

=PERCENTILE.INC(Sales[Revenue,0.75])

NORM.DIST()

Returns the normal distribution for the specified mean and standard deviation.

Refer to NORM.DIST function (DAX) - DAX | Microsoft Learn for more details.

Returns the normal distribution, for the X (32) value, having a mean of 30 and standard deviation of 1.5, for a cumulative distribution.

=NORM.DIST(32,30,1.5,TRUE)

Additional resources

Refer to the following sources to learn more about DAX syntax and how it can be used.

Knowledge check

Complete the following task. 

Putting it all together

Your focus at this point should be on learning the syntax and the basic rules for using DAX statements.

Note: You will learn to use Power BI Desktop later in topic 6 of this module.

The following video demonstrates how the statements, functions, values and operators work together in DAX queries and the output it generates in Power BI Desktop.

In the following series of examples and activities, you will see how statements, functions, values and operators all work together in DAX queries.

Practical activity - Writing DAX statements

Refer to the data table and answer the following questions. For each question,  attempt to write DAX queries yourself before looking at the solutions and answers provided.

Orders
Order ID Product ID Product Name Unit Price Unit Cost Quantity Sales
001 101002 Colored Pencils $ 3.50 $ 1.00 5 $17.50
001 101004 Flourescent Highlighters $ 5.00 $ 2.50 2 $10.00
002 101001 Crayons $ 7.50 $ 3.00 1 $7.50
002 101003 Dustless Chalk Sticks $ 2.50 $ 1.50 2 $5.00
002 101004 Flourescent Highlighters $ 5.00 $ 2.50 1 $5.00
003 101003 Dustless Chalk Sticks $ 2.50 $ 1.50 10 $25.00
003 101004 Flourescent Highlighters $ 5.00 $ 2.50 10 $50.00
Question 1

Create a calculated measure called 'Total Sales1' that adds all the values in the Sales field for all records in the Orders table.

Total Sales1 = SUMOrders[Sales] )
Question 2

Create a calculated measure called 'Total Sales2' that calculates the total sales value by multiplying the Unit Price and Quantity fields in the Orders table and adding all those values to get the total sales value as the final result.

Total Sales2 = SUMXOrders, Orders[Unit Price] * Orders[Quantity])
Question 3

Create a calculated column called 'Cost' to calculate the total cost of each product sold in each row of the Orders table.

Cost = Orders[Cost] * Orders[Quantity]
Question 4

Create a calculated measure called 'Gross Profit' that calculates the difference between the previously created measures Total Sales1 and Total Cost.

Gross Profit = Orders[Cost] * Orders[Quantity]
Question 5

Create a calculated measure called 'Profit Margin' that calculates the division of the 'Gross Profit' value and ensures that a value of '0' is indicated if the returned value equals '0'.

Profit Margin = DIVIDE Orders[Cost] * Orders[Quantity] )
Man working on two laptops and desktop monitor

Topic summary

Congratulations on completing your learning for this topic Programming protocols and techniques.

In this topic you learnt the following.

  • Introduction to Data Analysis Expressions (DAX)
  • DAX formula syntax
  • DAX functions
  • Writing DAX statements

Check your learning

The final activity for this topic is a set of questions that will help you prepare for your formal assessment.

Knowledge check

Complete the following three (3) questions. Click the arrows to navigate between the questions.

Assessments

Now that you have completed the basic knowledge required for this module, you are ready to complete the following assessment event:

  • Assessment 2 (Online Quiz)

What’s next?

Next, we will dive into how to identify business requirements relating to using big data for operational decision-making.

Module Linking
Main Topic Image
Team of data specialists on discussion
Is Study Guide?
Off
Is Assessment Consultation?
Off