Requirements for using big data in business

Submitted by shevorne.desil… on Fri, 03/24/2023 - 20:33

In this topic you will learn the requirements for using big data in business operations. These requirements include having an understanding and awareness of the:

  • domain knowledge of business processes
  • legislative requirements, including data protection, privacy laws and regulations
  • organisational policies and procedures
  • technology platforms for big data analytics.

Let us explore each of these requirements in detail.

Sub Topics

Importance of domain knowledge

Obtaining domain knowledge of business processes is essential for interpreting and using big data for operational decision-making.

A diagram showing importance of domain knowledge

Here are some reasons why:

  • Understanding the Context: Domain knowledge helps analysts and decision-makers understand the context in which data is generated. This includes understanding the inputs, processes, and outputs of the business, as well as the factors that influence performance.
  • Identifying relevant data: With domain knowledge, analysts can identify the most relevant data sources for a given problem, and distinguish between meaningful and irrelevant data. This ensures that the analysis is focused on the most important factors affecting business performance.
  • Interpreting results: Big data analysis can produce complex and detailed results, which can be difficult to interpret without domain knowledge. Understanding the business processes allows analysts to put the results in context, identify patterns and trends, and draw actionable insights.
  • Validating results: Domain knowledge is also essential for validating the results of big data analysis. Analysts can use their knowledge of the business processes to check if the results make sense, and if they are consistent with other sources of information.
  • Making informed decisions: Ultimately, the goal of big data analysis is to support decision-making. With domain knowledge, decision-makers can use the insights generated from big data analysis to make informed decisions that improve business performance.

In summary, domain knowledge of business processes is crucial for interpreting and using big data effectively for operational decision-making. Without it, analysts and decision-makers may struggle to understand the context, identify relevant data, interpret results, validate results, and make informed decisions.

Ways to obtain domain knowledge

There are several ways to obtain domain knowledge in an organisation or industry.

A diagram outlining how to obtain domain knowledge

Here are some methods you can consider.

  • On-the-job training: One of the most effective ways to gain domain knowledge is through on-the-job experience. Employees can learn about the processes and factors that impact business performance by working on projects, interacting with customers or stakeholders, and carrying out business operations.
  • Direct observation: One of the best ways to learn about the domain is by working closely with experienced colleagues, observing how they approach tasks, and asking them questions to gain a deeper understanding of the business processes.
  • Mentorship: Having a mentor who is knowledgeable in the business domain can be incredibly helpful for employees seeking to gain domain knowledge. Mentors can provide guidance, answer questions, and offer insights that help employees understand business processes and operations.
  • Asking questions: Asking questions from subject matter experts, employees and other staff who undertakes business operations helps to gain a deeper understanding of the business processes.
  • Referring to organisational documentation: Various organisational documentation provides useful information about the business and details how business operations are carried out. Examples of this documentation include policies, procedures, business process documents, workflow diagrams, etc.
  • Investigating sample datasets: A wide variety of datasets are generated by business processes. Investigating these sample datasets can provide insights and knowledge of the business and its operational workflow. Analysts can build test reports (slicing and dicing) using these available datasets to understand what other information can be obtained about the business and its operations.
  • Training programs: Many companies offer training programs designed to help employees gain domain knowledge. These programs may cover topics such as the company's products and services, industry trends, and best practices for business operations.
  • Industry conferences and seminars: Attending industry conferences and seminars can be a great way to learn about the latest trends and developments in the business domain. These events offer opportunities to network with industry experts and learn about new technologies and strategies.
  • Online resources: Many online resources are available to help employees learn about business processes and operations. This includes industry blogs, online forums, and e-learning platforms that offer courses and tutorials on a wide range of business-related topics.

In summary, obtaining domain knowledge as an employee involves a combination of on-the-job experience, mentorship, training programs, industry conferences, and online resources. By leveraging these resources, employees can develop a deeper understanding of the business domain, and apply this knowledge to make better decisions and contribute to the company's success.

Knowledge check

Answer the following four (4) questions. Click the arrows to navigate between questions.

Legislative requirements are an important consideration in any big data project. This helps specify certain boundaries for the project and identify any limitations for using specific data types.

The identification, collection, use and storage of data must be carefully controlled throughout the project and within each stage of the data analysis process.

For example, the following video explains how the Australian Public Service (APS) is committed to protecting the personal information of Australians. Pay close attention to how you can identify ‘personal information’ and how this information should be handled.

Based on what you learnt from the video answer the following questions.

Data protection and privacy concerns

When carrying out data analytic activities, especially here in Australia, organisations must take the necessary measures to ensure that they protect the privacy rights of individuals throughout the project. Data protection involves data security through data collection, storage, access and use.

Following are some of the issues that need to be considered by businesses when working with big data. 7

  • Data protection – regulatory requirements dictate that personal data must be processed for specified and lawful purposes and that the processing must be adequate, relevant and not excessive.
  • Privacy – when dealing with confidential and sensitive information included in the big data being tested, businesses need to be transparent on how the information is handled and protected and demonstrate that confidentiality is maintained at all stages.

Data protection and privacy laws and regulations

The following video explains what ‘privacy information’ is and how personal information requires to be protected according to the Privacy Act in Australia.

 

The following are examples of laws and regulations that apply to businesses in Australia for protecting data and privacy.

Privacy Act in Australia The Privacy Act 1988 makes provisions to protect the privacy of individuals in Australia. Therefore, businesses that deal with the personal information of individuals in Australia must ensure that they comply with this legislation.
The Australian Privacy Principles (APPs)

Under the Privacy Act 1988, the Australian Privacy Principles (APPs) provide guidelines for collecting, managing, and using personal information. Refer to the following sources and reference guides published by the Office of the Australian Information Commissioner (OAIC) for more detailed information about the APPs.8

General Data Protection and Privacy Regulation (GDPR)

The European Union (EU) has some data protection requirements in the General Data Protection Regulation (GDPR). A business operating in the EU or collecting information about individuals in the EU will need to understand the ramifications of the GDPR. While some GDPRs align with the APPs, there are some stark differences and complexities.

According to the guidance document Australian entities and the EU General Data Protection Regulation (GDPR) published by OAIC:

Australian businesses of any size may need to comply if they have an establishment in the EU, if they offer goods and services in the EU, or if they monitor the behaviour of individuals in the EU. 9
Laws and regulations of specific industries

As well as the Privacy Act, several other laws cover privacy. They apply to areas such as financial transactions, health information and telecommunications. Some examples are given below. 

Laws and regulations of States and territories

The Privacy Act is a federal law, and different states and territories have additional legislation. Further information about what may apply can is available on the OAIC website.

Here are some examples of data protection and privacy legislation as it applies to different states and territories in Australia.

    Related legislation

    Related legislation that may apply to the collection and processing of data, as outlined by OAIC, includes: 10

    Knowledge check

    Answer the following seven (7) questions. Click the arrows to navigate between questions.

    Government agencies have a set of guiding principles allowing some private information sharing. Watch the following video to learn why this is allowed and what type of information can be shared.

    Legislative requirements for accessing and using big datasets and summaries.

    Overall, when using data summaries from a third-party resource, it is important to be aware of the relevant copyright requirements and to ensure that you comply with any licensing terms or other controls that apply to the use of the data.

    Copyrights and Creative Commons (CC) licensing

    When accessing datasets from third parties, it is important to consider the copyright and licensing rules that apply. Licensing determines the terms and conditions under which the dataset can be accessed, used, and shared.

    Some common types of licenses for datasets include Creative Commons (CC) licenses. Careful review of the terms of the license before accessing or using the dataset is vital to ensure compliance and avoid any legal issues. Creative Commons licenses are a set of standardised licenses that allow creators of original works, including datasets, to specify the terms under which others can use and share their work11.

    A thorough review of the terms of the CC license before using a third-party dataset is also important. Some CC licenses may have restrictions on commercial use, modifications, or distribution, which could limit the ways in which the data can be used. In addition, it is important to properly attribute the dataset in accordance with the terms of the CC license to give credit to the original creator.

    Read through the article Data Sharing Agreements and Licenses from the Data.NSW official website on open data licensing.

    An example from the Australian Bureau of Statistics (ABS)

    The Australian Bureau of Statistics (ABS) provides a range of data products, including summary tables, data cubes, and other statistical outputs. In general, the use of data from ABS products is subject to copyright laws, which protect the original expression of ideas and information. The following are some key copyright requirements that apply to data summaries from ABS:12.

    • Acknowledgment of source: When using data from ABS products, it is important to acknowledge the source of the data by including a citation or attribution statement that clearly identifies the ABS as the source of the data.
    • Compliance with ABS licensing terms: The ABS provides various licenses for the use of its data, which may include restrictions on the use, reproduction, or dissemination of the data. It is important to comply with these licensing terms when using ABS data.
    • Fair dealing: In Australia, the concept of "fair dealing" provides some limited exceptions to copyright infringement for purposes such as research, criticism, review, news reporting, or education. However, it is vital to ensure that any use of ABS data under fair dealing provisions is done in a reasonable and proportionate manner and does not unduly impact the value or integrity of the original work.
    • Permission for commercial use: If you wish to use ABS data for commercial purposes, such as in a product or service that you plan to sell or distribute, you may need to obtain permission from the ABS or pay a licensing fee for the use of the data.
    Knowledge check

    Complete the following three (3) questions. Click the arrows to navigate between questions.

    A person checking company policy documents

    An organisation must have clear policies and procedures to run smoothly and efficiently. Procedures and policies must align, but they are not the same thing.

    • Policies are guidelines that outline an organisation’s rules and requirements. For example, a policy could state requirements for workplace safety or employee conduct.
    • Procedures are steps or specific actions required to carry out the policy. For example, the workplace procedure may state how to perform a safety audit or employee conduct expectations.

    Organisations will have policies and procedures coving all aspects of the business, including guidelines and requirements for handling and analysing big data and how the results are stored and disseminated. You must work within your organisation’s policies and procedures at all times and those relevant to big data testing.

    Policies and procedures must be followed at different stages of the data-driven decision-making process.

    You must be particularly familiar with the following policies and procedures for this unit.

    Procedure for scoping business requirements for using big data

    Provides guidelines on investigating the business processes and workflows to identify opportunities to use big data for operational decision-making.

    It outlines specific processes for gathering key requirements that sets out the scope of the project, such as the nature, size, scale, timelines and reporting needs.

    Following this procedure ensures that the operational decision-making requirements are well-defined and aligned with the needs and expectations of all stakeholders, which can help to improve the usefulness and efficiency of the decision-making process.

    Procedure for confirming business requirements for using big data

    This procedure includes specific business processes and best practices for verifying the relevance and practicality of the identified operational decision-making needs of the project.

    This may include recommendations for consulting required stakeholders using appropriate modes of communication.

    Data access policy

    The policy typically includes information such as the purpose of the dataset, the types of users allowed to access the data, the procedures for requesting access, and the terms and conditions of use. It may also outline any restrictions on the use of the data, such as limitations on commercial use or data sharing.

    The goal of a Data Access Policy is to ensure that the dataset is used in a responsible and ethical manner and to protect the privacy and confidentiality of any individuals whose data is included in the dataset.

    Procedure for combining external big data sources

    Provides guidelines on how to combine big data sources such as social media, with in-house big data. External big data sources may include data in different formats, so these must be transformed into a structured form before combining with in-house data.

    Different approaches or strategies must be used when handling different formats of data. Some of the key tasks outlined in this procedure document include the following.

    • Clean the data: Data is cleaned to remove errors, inconsistencies, or duplicates. This may involve using data cleansing tools and techniques such as outlier detection and data profiling.
    • Format the data: Data is formatted to ensure that it is consistent. This may involve standardising data types, units of measure, and data structures.
    • Organise the data: Data is organised into a logical structure to be easily analysed. This may involve creating a data model or a schema that defines the relationships between the different data elements.
    • Merge the data: External data is merged with in-house data using data integration tools and techniques such as ETL (extract, transform, load) processes, fuzzy matching and so on.
    • Verify the data quality: The quality of the merged data is verified using data quality metrics such as completeness, accuracy and consistency.
    • Store the data: The merged and cleaned data is stored in a centralised repository such as a data warehouse or a data lake.
    Data analysis procedure

    These procedures provide guidelines for evaluating and investigating the big dataset to find trends and forecasts to identify business insights. These procedures include instructions on:

    • specific methodologies to use during the analysis
    • visualisation guidelines and specifications
    • how statistical analysis can be conducted to verify initial analysis results
    • sample formulas and scripts that can be used when conducting the analysis
    • how analysis reports should be developed.
    Procedure for integrating big data analytics into operational workflows

    These procedures include guidelines on effectively incorporating the insights obtained by analysing big data into the organisation’s business processes. This often outlines best practices and guidelines on the multi-step process to be followed (which differs from organisation to organisation). This process generally requires careful planning, designing and implementing the workflow to incorporate the insights and patterns identified through the analysis.

    This generally involves the following key steps.13

    1. Source and configure data capture – involves obtaining the required data, implementing processes to capture data from workflows relevant to the business operations in the analysis and storing the data in a repository for easy and secure access.
    2. Identify key measures and metrics – involves identifying the key metrics required for the analysis to help analyse and monitor:
      • insights from the data
      • results of the operational decisions
      • the progress of the business operations and processes.
    3. Analyse operational insights – involves analysing the data to obtain insights relevant to specific operational decisions, understand historical performance, and identify patterns to conduct forecasts and predictions. An example of a type of analysis performed is a what-if scenario analysis. This helps to evaluate outcomes against different parameters.
    4. Utilise insights for action –involves applying the insights from the analysis to make operational decisions. This may include building capabilities that allow employees to action the insights from the analysis.
    Policies and procedures for reporting on big data analysis

    Provides best practices and guidelines to help communicate the analysis of the big dataset using the appropriate technology platforms and reporting tools.

    These procedures include instructions on the following:

    • how to present analysis results clearly and consistently using standard representations (e.g charts, graphs etc)
    • specific report structures that should be used (formats, style guides etc)
    • which technology platform/tool to use, and specific guidelines on how to use certain reporting features of the tool
    • how to store the analysis results and supporting evidence considering relevant legislative requirements.
    Procedure for applying visual analytic standards

    Provides a set of best practices to create effective visualisations that are easy to interpret and communicate insights effectively. Some examples of best practices include the following.

    • Avoid cluttered designs, use appropriate axis labels and ensure the visualisation is accessible to all users.
    • Choose appropriate design elements such as colour, layout, and typography to make the visualisation easy to interpret.
    • Choose the appropriate visualisation type.
    Stakeholder feedback policy

    Includes guidelines on identifying:

    • stakeholders who are impacted by the analysis (e.g. customers, employees, regulators, shareholders, and other external stakeholders)
    • the recommended communication methods and techniques to collect relevant participants' comments and opinions (e.g. surveys, feedback forms, email, social media, or other communication channels).

    These guidelines may also include procedures for informing stakeholders of the changes made due to their feedback.

    Procedure for distributing analytic reports

    Provides guidelines on:

    • how to store analytic reports with appropriate security measures for access by required personnel
    • distribute big data analytic reports to the intended stakeholders through appropriate channels, such as email, web portal, or in-person meetings.
    Knowledge check

    Answer the following eight (8) questions. Click the arrows to navigate between questions.

    A close view of a person typing on a keyboard

    There are a wide variety of technology platforms available today that enable the use of big data. Typically, from organisation to organisation, the choice of these platforms may differ. Some big data platforms may have additional or more advanced tools and features than others.

    Platforms

    A big data platform is more than a repository for data.

    A data platform is a complete solution for ingesting, processing, analysing and presenting the data generated by the systems, processes and infrastructures of the modern digital organisation. 14

    The following video explains more about the functions of big data platforms.

    Other platforms with different offerings are summarised at What is a Data Platform?.

    Optimisation and simulation tools for what-if scenarios

    Tools for optimisation and simulations include various software packages, such as Excel Solver, R, Python, Power BI and Tableau to name a few. These tools can be used to create what-if scenarios, optimise decision-making, and run simulations to explore different outcomes.
    Let us explore some of these optimisation and simulation tools used to develop what-if analysis scenarios.

    What If Tool (WIT)

    This is a simulation tool where you can test performance in proposed situations (what-if scenarios). It also helps to analyse the importance of different data features and visualise model behaviour across multiple models and subsets of input data. It can be optimised for different Machine Learning (ML) fairness metrics.

    The following video provides a brief introduction to this tool and demonstrates how it is used for data modelling and simulations at a high level.

    Power BI and Tableau

    Technology platforms that provide tools to simulate and optimise the feasibility of what-if scenarios for operational decision-making by using parameters.

    To learn more, refer to the following articles.

    Microsoft Excel add-ins and tools

    There are a variety of ‘What-if Analysis’ tools available in Microsoft Excel that takes sets of input values and determines possible results.

    The following video introduces the what-if analysis tools available in Microsoft Excel and demonstrates the use of a feature called ‘Goal Seek’ at a very basic level.

    Read the article Introduction to What-If Analysis by Microsoft Support to learn more about these tools. The following are some of these tools and add-ins.

    Goal Seek

    Goal Seek can be used to determine the desired result from a formula. It works with only one variable input value.

    For more information, refer to the Microsoft Support article Use Goal Seek to find the result you want by adjusting an input value.

    Solver

    This is an Excel add-in that has powerful features to further optimise ‘What-If Analysis’ by accommodating more variables and has the ability to create forecasts using various commands that are built into Excel.

    For more information, refer to the Microsoft Support article Using Solver to determine the optimal product mix by Microsoft Support.

    Data Tables

    Using data tables makes it easier to examine a range of possibilities at a glance. However, a data table cannot accommodate more than two variables. For more information, refer to the Microfot Support article Calculate multiple results by using a data table.

    Scenarios

    Scenarios are a set of values that Excel saves and can substitute automatically in cells on a worksheet. They provide the ability to analyse more than two variables.

    For more information, refer to the Microsoft Support article Switch between various sets of values by using scenarios.

    Note: In this unit you are not required to use these tools. However, it is important to know that there are various optimisation and simulation tools that can perform what-if scenario analysis.

    The following video demonstrates how Excel add-ins and tools such as ‘Goal-Seek’ and ‘Solver’ are used to solve business problems and optimise the decision-making process.

    Introduction to Power BI

    For the purpose of this module, you will be introduced to Microsoft Power BI Desktop, which is a single tool and a technology platform with multiple capabilities. Therefore, learning to use this tool/platform will help you gain the hands-on skills required for this module.

    What is Power BI?

    According to Microsoft,

    Power BI is a collection of software services, apps, and connectors that work together to turn you unrelated sources of data into coherent, visually immersive, and interactive insights. 15

    To learn more refer to the article, What is Power BI? - Power BI from Microsoft Learn. Watch the following video as an introduction to Microsoft Power BI.

    Power BI terminology and basic concepts

    Following are some of the common terms you will come across when using Power BI for data analysis and visualisations.

    • Workspaces
    • Dashboards
    • Reports
    • Datasets
    • Visualisations

    Refer to the following to familiarise yourself with the basic terminology concepts for using Power BI.

    The following video explains the terminologies and basic concepts in Power BI.

    What is Power BI Desktop?

    According to Microsoft,

    Power BI Desktop is a free application you install on your local computer that lets you connect to, transform, and visualise your data. 16

    Refer to the following articles from Microsoft Learn to learn more about Power BI Desktop and how to get started. 

    The following video provides an introduction to Microsoft Power BI Desktop and demonstrates how to get started with using this tool.

    Using Power BI Desktop on a macOS computer

    You are required to perform a variety of tasks using Power BI Desktop in this module. If your personal computer is installed with Windows operating system (e.g. Windows 10), ignore the following set-up guidelines, as these instructions will not apply to you.

    However, if your personal computer is installed with macOS you will not be able to install these applications on your computer. Therefore as an alternative solution you will be provided with additional guidelines on how to set up a virtual Windows environment on your macOS computer so that you are able to follow through the activities in this module.

    Steps:

    1. Download the VM - Windows 10 (.zip) file to a location on your macOS computer from this link.
    2. Download the Virtual Machine instructions for macOS users document and read through the instructions for setting up the virtual environment.
    3. Watch the following video demonstration on how to follow through the steps using the files you have downloaded.

    Knowledge check

    Answer the following two (2) questions. Click the arrows to navigate between questions.

    Topic summary

    Congratulations on completing your learning for this topic Requirements for using big data in business.

    In this topic, you learnt the following.

    • Domain knowledge of business processes.
    • Legislative requirements for accessing and using big data, including data protection and privacy laws and regulations.
    • Organisational policies and procedures relating to using big data.
    • Technology platforms for big data analytics.

    Check your learning

    The final activity for this topic is a set of questions that will help you prepare for your formal assessment.

    Knowledge check

    Complete the following four (4) questions. Click the arrows to navigate between the questions.

    What’s next?

    Next, we will learn about presenting big data in analytics.

    Module Linking
    Main Topic Image
    A manager making a presentation to colleagues
    Is Study Guide?
    Off
    Is Assessment Consultation?
    Off