A few weeks ago, I wrote a post that described how to maximize throughput between HDInsight clusters and Windows Azure Storage. One of the suggestions I made was to adjust your HDInsight cluster’s self-throttling mechanism - i.e. tune the fs.azure.selfthrottling.read/write.factor parameters. I also suggested that the best way to find the optimal parameter values was ultimately to turn on storage account logging and analyze the logs after you had run a job or two. This post describes how to use a new command-line tool (available as part of the .NET SDK for Hadoop) that makes analysis of storage account logs easy.
One of the questions the HDInsight team sees a lot is a variation of the question “How do I figure out what went wrong when something does go wrong?” If you are familiar with Hadoop, you are probably also familiar with rolling up your sleeves and digging into Hadoop logs to answer this question. However, we’ve found that many folks using HDInsight don’t know that much of the logging information they are accustomed to using is easily available to them for HDInsight clusters. This is a quick post to outline the types of logs that are written to your Azure storage account when you spin up an HDInsight cluster: