Azure storage is a Microsoft solution designed to store your data on it with different models (Blobs, Tables, Queue, File Shares, and Disks) to know the differences, usage, High Availability and the architecture for Azure storage I highly recommend you to check this path you will find a lot of tips and articles About Azure storage subject.

Now let us take a look about storage technology when we talk about Big data, what is the best option for our data and how I can choose the best option, based on what so let us think from architecture and design point of view.

Microsoft provided a variety of options to store your big data on it based on your requirements and based on the data type, in this article we will discuss 4 options that can be used as Azure storage solution for big data.

  • Azure Storage Blobs
  • Azure Data Lake Store
  • Azure Cosmos DB
  • Azure HBase on HDInsight

Azure Blob Storage

Blobs or binary large objects is a very helpful solution for storing big data or even small data and this type contain 3 types of storage (Cool, Hot, Archive) and you can save on it audio files or videos files it is very similar to the Azure data lake, the big difference between Azure blob storage and Azure data lake especially Azure data lake Gen 2 is the hierarchical namespace and this give you the option to store your objects (Data/Files) within the account and the account will organize your data into hierarchy directory like the File Explorer in our PC for more information about Azure data lake Gen 2 hierarchical namespace check Microsoft documentation the best place that you can take the valid information’s, But in Azure blob, we can create a container that you can store on its set of blobs

Azure Data Lake Storage (ADLS)

ADLS it is fully managed services provided by Microsoft as solution to store and retrieve your data into/ from ADLS, Azure Data lake Storage contain two-generation (GEN2) and (GEN1) Gen2 is the new generation of ADLS and it have the same feature of GEN1 for storing files on directory but in GEN2 you have a lot of features and capabilities Such as Security, High availability, scaling Check this article to know 10 things about ADLS Gen2, Azure Data Lake build on Hadoop technology this means it is supported HDFS, With ADLS you will have unlimited storage , and easily you can analyzed your data using Hadoop Framework Such as Hive or you can use  Azure Data Lake Analytics also you can Query your data using U-SQL language U-SQL is a language that combines between C# and T-SQL For more information about how to use it check this article, with ADLS you ca store your data from multiple sourcing and from there it can be taken and moved for transformation or processing using Azure Databricks for Example.

Azure Cosmos DB

Before taking about it I highly recommend you to check this article to know the basic information about cosmos DB and how to provision it, Azure Cosmos DB is non-relational data and the most important feature in Azure Cosmos DB is globally distributed multi-model, So Azure Cosmos DB is Database as services and it is globally distributed multi-model database that is mean supported multi API (Table API, Cassandra API, Mongo DB API, SQL API, graph API such as Gremlin DB). Microsoft Support it by 99.999% Uptime SLA and it can be work on multiple geographic regions, with very important feature Multi-region Check this link for more information about Multi-Regions High Availability in Azure Cosmos DB

Azure HBase on HDInsight

HDInsight HBase is a managed cluster integrated with Azure Storage, give you the ability to store big data directly on Azure Storage and the most interesting feature on it is low latency and Cost. Apache HBase is open Source Database NoSQL Hadoop Database For more information About Apache HBase check this link and for other information on Azure HDInsight Check this link

Which Solution is a proper fit for your project?

IF you are designed new project architecture you will need to know the answer to the question and to answer it you should full information about below points/ questions

  • What is the type of data?
  • How much data are going to save it on your storage?
  • How much data are going to move it outside the storage?
  • What is the amount of data you need to store it or by another meaning How large is the data set?
  • Data will be accessed from one location or multiple locations?
  • What is the high Availability type you need to configure it on your data?
  • What are the performance requirements when you load data on your storage and select it?
  • How the data will be accessed by another APP or by end-user directly?

Collecting all of this information will direct you to the good decision for which solution is a proper fit for your application and project and in this link, you will find Capability matrix for each solution it will give you full visibility about use cases for each solution (Azure Storage, ADLS, Cosmos DB, HBase)

Study Resources For Azure Cosmos DB

Study Resources For Azure Data lake

Keep Following

Cloud Tech Website blog survey

IF you found this blog is helpful and sharing useful content please take few second to do rate the website blog from here

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.