- 6/16/2025
Welcome to another lesson in the SkillTech DP-900: Microsoft Azure Data Fundamentals Course!
In this video, we explore the core components of a modern data warehouse on Azure. Learn how data is stored, processed, and served for analysis using scalable and cloud-native services.
What You’ll Learn in This Session:
What is a Modern Data Warehouse?
Key components: Data Lake, Data Movement, Storage, Processing, Serving
How Azure Synapse Analytics fits into modern data architecture
Real-world examples of end-to-end data flow in Azure
This video is perfect for:
Beginners in cloud data architecture
Students preparing for the DP-900 Certification Exam
Data professionals migrating to modern cloud solutions
Explore our other Courses and Additional Resources on: https://skilltech.club/
In this video, we explore the core components of a modern data warehouse on Azure. Learn how data is stored, processed, and served for analysis using scalable and cloud-native services.
What You’ll Learn in This Session:
What is a Modern Data Warehouse?
Key components: Data Lake, Data Movement, Storage, Processing, Serving
How Azure Synapse Analytics fits into modern data architecture
Real-world examples of end-to-end data flow in Azure
This video is perfect for:
Beginners in cloud data architecture
Students preparing for the DP-900 Certification Exam
Data professionals migrating to modern cloud solutions
Explore our other Courses and Additional Resources on: https://skilltech.club/
Category
🤖
TechTranscript
00:00Okay, so now it's a time to talk about our last module of this particular course and that is
00:14going to talk about modern data warehouse analytics. Up to this point we have seen relational non
00:21relational data but now we are going to focus on data analytics and in that also we want to
00:26understand what data warehouse is all about and what kind of data analytics service are provided by
00:33Azure cloud. In this module we will focus on four different lessons. First we will start with the
00:39components of modern data warehouse and what kind of things and services can be there. Then we will
00:44explore large-scale data analytics with Azure cloud and then we will try to deploy some of the
00:49analytical services like Azure Data Factory or Azure Synapse on Azure portal and then we will
00:56move forward to our lesson number three which is we are going to start building with Power BI so we
01:03will understand how Power BI will help us in this kind of data analysis and then finally the most
01:10important lecture for you is going to be maybe the certification roadmap. If you are planning to keep
01:16clear DP 900 Azure certification then what exact preparation you need to do after this course,
01:22what kind of reading or practical exercise you have to do after this course, all those things I am going to
01:28share it with you on lesson four. In this lecture we are going to focus on components of modern data
01:34warehouse. Now obviously if you have never heard about data warehouse terminology then this video is
01:40actually for you because we will understand what kind of components will be there in modern data warehouse and
01:46what is the exact role they are playing in that flow. We will start with exploring data warehouse concepts and
01:54how the data will be ingested then process and then used for analysis. After that we will focus on exploring
02:01Azure data services for modern data warehouse. Technically there are five different services which are available in
02:07Azure which are focusing on data warehousing. We will see all five in this particular course. We will explore
02:14modern data warehousing architectures and workloads where we can understand that how data flow will be
02:20organized and managed and then finally we will focus on exploring Azure data services in Azure portal.
02:27Now to begin with I first want you to understand modern data warehouse components. You can see that I have
02:35some diagram which is like an architectural diagram and I want you to focus from left to right side in this.
02:41The left side portion is first showing me some kind of data which can be in a file format. It can be a tabular format
02:48coming from the database or maybe some raw format kind of data is there. Once this data is there, this data will be ingested
02:55into a data factory. The second icon is actually data factory which is mostly going to ingest the data into that
03:02into that and then it is going to do ETL, ELT kind of transformation on that. After that kind of data transformation,
03:10maybe we can just associate that thing with data lake storage where the data will be stored and then it's going to be ready
03:17for further analytics processing. Maybe directly from the data lake storage there are two components which will connect
03:23with that. First one which is there on the top is actually synapse and the below one which is in the orange color
03:30is actually Azure data bricks. Maybe the further processing or further transformation is required before making an analytics.
03:40We need to prepare the data then we will transfer the data from data lake storage to our data bricks and then data bricks is going to have
03:49enough compute available in their clusters from where directly it will be associated with either Power BI
03:55or maybe the data will be provided into Azure Synapse. From Azure Synapse you can associate with the data flows
04:04or you can directly connect with the Power BI kind of dashboards where we can visualize the data
04:09and my data engineers and data scientists can take some business decisions or take the business detail insights
04:16from the data which we have analyzed. This full flow is showing me that how data is going to be flowed
04:23in the data warehouse component and each and every service here which is available in this flow can be replaced
04:30by some other services or as an alternative you can use some other cloud services with that. But in each piece
04:37the processing of the data is mostly going to remain same. Now if I talk about the Azure services for the
04:43modern data warehouse then mainly there are five services which you need to understand and you need to utilize.
04:50The first one is Azure Data Factory. The Azure Data Factory is like a real factory. Like you know in the real factory
04:57you take the raw data, raw materials and then you create products. Same way Azure Data Factory is a service
05:04that can ingest a large amount of raw data, unorganized data mostly and then from this raw and unorganized data
05:12they are going to convert this data into some kind of a meaningful information which will be a final product
05:17of this Azure Data Factory. As I mentioned earlier this is going to allow you to do some kind of extract
05:24transform load ETL or extract load transform ELT kind of transformations with that and obviously there are
05:31a couple of data integration projects which are currently using Azure Data Factory. Then we have Azure Data Lake.
05:38Azure Data Lake. Like the name suggests Data Lake is like a huge lake where you can store your data and
05:45it allows you to store almost all kind of data which can be structured, unstructured, semi-structured
05:51kind of data. So ultimately this is a collection of analytics and storage services that you can combine
05:57to implement a big data solutions. Most of the time Data Lake is actually having three things inside that.
06:03Data Lake Store, Data Lake Analytics and then the HD inside is also integrated with that.
06:10We will see that thing in the next discussion. After this we have Azure Databricks.
06:16Azure Databricks is the official collaboration of Microsoft Azure Cloud and Databricks platform which allows you to
06:22run your Apache Spark Notebooks directly with the dedicated compute power created by the Databricks cluster.
06:31This analytics platform optimized for Microsoft Azure cloud services and it is designed by the same founder of Apache Spark.
06:39Databricks is integrated with Azure to provide one-click setup so that you don't need to worry about the deployment of the clusters,
06:47the monitoring of the jobs, the monitoring of the jobs and the things which are running inside the Spark.
06:52And it enables you to collaborate between data scientists, data engineers and even business analyst at one place.
06:59Then we have something called Azure HD Insight. This is a managed analytics service in the cloud which is based on Apache Hadoop.
07:07And it's a collection of open source tools and utilities that enables you to run processing tasks over large amount of data.
07:16If you are a Hadoop developer or if you are a Hadoop data engineer, you can easily integrate with Azure HD Insight
07:22and you can understand the same concepts inside this.
07:25HD Insight is using a cluster model similar to the Synapse Analytics which we will see in the next phase.
07:32So architecture wise, they have a similarities but this is something which is dedicatedly built for the big data associated with Hadoop.
07:40HD Insight stores data using Azure Data Lake storage.
07:44Somewhere in the background, they are also using Azure Data Lake.
07:47And you can use HD Insight to analyze data using frameworks such as Hadoop Map, Reduce, Apache Spark, Apache Hive, Apache Kafka, R or maybe Apache Store.
07:59And then finally we have Azure Synapse Analytics which is one of the most popular services of analytics in the recent time.
08:07Azure Synapse Analytics is generalized analytics service.
08:11You can use it to read data from many sources, process the data using any kind of compute which can be integrated through HD Insight,
08:20Data Breaks, Data Lake or even Azure Functions.
08:23And then you can generate various analysis and the models and save it in the result.
08:29For further processing, if you want to integrate that analytics data with machine learning or AI,
08:35then you can also integrate Synapse Analytics with Azure Machine Learning Studio.
08:40Now let's try to make a deep dive in each of the services so that we can understand better.
08:45First one is Azure Data Factory.
08:48As we said, this is a data integration service which takes a raw data, process it and then it's going to produce some meaningful data from that.
08:58So, Azure Data Factory is a service that can ingest large amount of raw unorganized data from relational, non-relational systems.
09:05And it converts a data into meaningful information.
09:09In order to do this process, there are couple of terminologies which we need to understand inside Data Factory,
09:15which are like link services, data set, pipelines and activities.
09:21And we are going to see this thing in the demos of this same module.
09:24During the transformation process of Azure Data Factory, this is going to filter out noise and keep the interesting data.
09:31So, maybe you have a huge bulk data which is connected from multiple sources and now you just want to take some meaningful data from that.
09:41And those kind of transformations will be done by Azure Data Factory.
09:45Azure Data Factory works on triggers.
09:47So, whatever pipelines you could define inside that which can have series of tasks,
09:51you can manually trigger them or you can automatically also schedule them to get triggered on a certain events.
09:58Now, let's talk about what is Azure Data Lake Storage.
10:02It is actually organize your files and directories and sub-directories for improved file organization.
10:09So, ultimately this is something like your Azure Storage account with the blob storage support into that.
10:14But then, Azure Data Lake Storage provides general security over data using access control list.
10:21This is something which is additional advantage of Data Lake Storage which is not there in your normal Azure Storage account.
10:28As we have created Azure Storage account, we know the thing that there are some options in that which will make sure that I can create a storage account with Data Lake Storage Gen 2 kind of option.
10:40Which gives me hierarchical namespace and this kind of additional security which is coming from access control list.
10:46An access control list specifies which accounts can be accessed which files and folders in the store will be accessed by users and accounts.
10:55If you are more familiar with Linux, you can use POSIX which is known as Portable Operating System Interface.
11:04That kind of style permissions you can use and you can grant read-write permissions into that.
11:10Or if you are familiar with Azure, then you are surely familiar with something called Role-Based Access Control which simply allows you to assign a role of the user of the service principle.
11:23And then based on that, you can control what kind of access permissions you can give into that.
11:28You can use any one of these things into Azure Data Lake Storage.
11:31Azure Data Lake Storage is also compatible with Hadoop Distributed File System HDFS.
11:37And you know the thing that Hadoop is highly flexible and programmable analysis service which is used by many organizations to examine large quantity of data.
11:47All the Apache Hadoop environments can access the data which is available in Azure Data Lake Storage Gen 2.
11:53Other than Data Lake Storage Gen 2, you also need to maybe research on Azure Data Lake Storage Gen 1 and Gen 2 which can be worth of your time because knowing this storage properly is going to give you an additional advantage during this certification exam.
12:10Now let's talk about Azure Databricks which is an Apache Spark based platform that provides big data processing and streaming.
12:19There are chances that certain transformations which you cannot do with the very good performance in Azure Data Factory then for those kind of situations you will come to Databricks.
12:29Because Databricks is giving you a huge compute which is associated with that where you can have a cluster of computers allowing you to process the data as fast as possible.
12:39It also simplifies the provisioning and collaboration of Apache Spark based analytical solutions.
12:46Because you don't need to worry about deploying clusters manually or running the jobs continuously or manually into that.
12:52All these things will be taken care by Azure Databricks service itself.
12:56It also take care of the security capabilities of Azure.
12:59It can generate tokens for authentication and authorization.
13:03And you can also use your Azure role based access control integrations with Azure Databricks.
13:10This is also integrated with varieties of Azure Data Platform services like Power BI so that you can visualize this thing.
13:17Ultimately Azure Databricks allows you to create some kind of Apache Spark notebooks where you can write your code with Python, R, Go such kind of language or even you can use SQL for that.
13:31All these notebooks and the cells which are there inside the notebooks will execute on Azure Databricks cluster.
13:38And then you have to pay for those clusters with the dedicated compute which you have configured in that.
13:44Now let's talk about what is Azure Analysis Services all about.
13:48Because we have seen this thing that we have something a service called Azure Synapse.
13:53And now what kind of thing it is doing and what flow we have to understand in that is there on screen right now.
13:59Azure Analysis Service enables you to build tabular model to support online analytical processing, formerly known as OLAP queries.
14:08It can combine data from multiple sources, include Azure SQL Database, Synapse Analytics, Data Lake Storage or maybe a Cosmos DB.
14:17It can connect with any of those things.
14:19It can get data from that and then it can further process into that for the analysis.
14:23This analysis services are going to take data from any data sources.
14:28That's why as you can see in the diagram left side of analysis service we have data sources,
14:33ingestion is happening from storage block or maybe on premise network or maybe the data is coming from Azure Synapse also.
14:40Where it is stored in the data warehouse format.
14:43We will take the data from that into an analysis system and then after the analysis maybe we have to do some authentication for that.
14:51And once the authentication is done we can visualize the data with the help of Power BI.
14:55This full flow is actually allowing you to process the data and analyze in such a way that it can be giving me some meaningful insights for my business processes.
15:05And I can not only visualize this thing in Power BI but according to that visualization I can also take some business decisions.
15:14If you have a question what is Azure Synapse is all about then we are going to talk about that in this module in detail after some time.
15:22Now the last and important piece is actually Azure HD Insight.
15:26Azure HD Insight is a big data processing service which allows you to use open source libraries on one platform in the Azure environment.
15:35So obviously your all the Hadoop utilities of big data environment will be available in Azure HD Insight with the spark cluster configuration running in the background.
15:45You can use the compute power of this spark cluster, you can get your data in the Hadoop format or maybe in the normal file format from various data sources.
15:55And then you can also integrate the IoT devices and the data coming from the sensors into this.
16:02You can integrate HD Insight with your Databricks which are notebooks which are available in Azure Databricks.
16:08And then you can also associate this thing with analytical service.
16:12So HD Insight is actually using this kind of clustered model similar to that of Synapse Analytics.
16:18Somehow this is one of the nearest competitor of Azure Synapse Analytics where people are going to choose either HD Insight or Synapse in most of the projects.
16:29And which one they are going to use that totally depends on what kind of team background they are going to have,
16:34what kind of technologies they are working on and what kind of technologies they have selected in their application architecture.
16:42You can use HD Insight to analyze data using frameworks such as Hadoop, Apache Spark, Apache Hive, Apache Kafka and all.
16:51And that's giving me one basic reason that if I am having a background of Hadoop data system and if I want to integrate those same things with Azure Cloud,
16:59then HD Insight is going to be the right choice for me.
17:03But suppose if I am not dealing with the Hadoop kind of big data systems and I only want to use this for analytical service
17:11and I want to take it with the extremely good scale, then maybe I will go for Azure Synapse Analytics in that case.
17:19So in this video we have seen that which kind of components are there in modern data warehouse systems
17:25and specifically we have discussed about Azure services which are offered in this scenario.
Recommended
17:45
|
Up next