Maximizing HDInsight throughput to Azure Blob Storage
The HDInsight service supports both HDFS and Windows Azure Storage (BLOB Service) for storing data. Using BLOB Storage with HDInsight gives you low-cost, redundant storage, and allows you to scale your storage needs independently of your compute needs. However, Windows Azure Storage allocates bandwidth to a storage account that can be exceeded by HDInsight clusters of sufficient size. If this occurs, Windows Azure Storage will throttle requests. This article describes when throttling may occur and how to maximize throughput to BLOB Storage by avoiding throttling.
Insights on HDInsight
I think it’s about time I dust off this blog and realign it with my current focus: HDInsight. I’ve been heads-down since February (when I joined the HDInsight team) learning about “big data” and Hadoop. I haven’t had much time for writing, but I’m hoping to change that. I’ve learned quite a bit in the last few months, and I find that writing is the best way to solidify my learning (not to mention share what I’ve learned). If you have topics you’d like to see covered, let me know in the comments or on Twitter (@brian_swan) – I do what I can to cover them.