Skip to main content

Databricks Integration with Azure Application Insight (Log4j) logs

In this cloud era, there is always the case where we need to capture our application logger logs to some central location.

We are using Azure Application Insight to collect and analyze application logs.

Reference –

https://github.com/AnalyticJeremy

https://medium.com/analytics-vidhya/configure-azure-data-bricks-to-send-events-to-application-insights-simplified-c6effbc3ed6a

I would like to thank you Jeremy Peach for creating such a beautiful Listener and Balamurugan Balakreshnan for medium post.

This post is the simplified summary of Spark Databricks application logs to Azure Application Insight.

Step 1 – Get the required credentials, we need 3 things.

 Application Insights > Instrumentation Key  
 Log Analytics workspace > Workspace ID  
 Log Analytics workspace > Agents management > Linux servers > Primary key  

Step 2 – Clone the git repo

 git clone https://github.com/AnalyticJeremy/Azure-Databricks-Monitoring.git  

Step 3 – Edit appinsights_logging_init.sh file inside code folder and add

 APPINSIGHTS_INSTRUMENTATIONKEY="--------------------"  
 LOG_ANALYTICS_WORKSPACE_ID="------------------------"  
 LOG_ANALYTICS_PRIMARY_KEY="--------------------"  

Step 4 (only for Windows user) – If you are on windows you might need to download DOS to Unix utility and convert dos shell script to Unix compatible shell script.

 https://sourceforge.net/projects/dos2unix/  
 or   
 Notepad++ > Edit > EOL Conversion > Unix (LF)  

Step 5 – Create required directories inside Databricks file storage.

 pip install databricks-cli  

Get the Databricks host and Token and Host you can get from Azure Databricks portal

 Azure portal > Azure Databricks Service > Overview > URL   

For token launch workspace

 Azure portal > Azure Databricks Service > Overview > URL   
 For token launch workspace  
 Right top corner > User settings > Generate new token  

Create DBFS directory for Application Insights

 dbfs mkdirs dbfs:/databricks/appinsights  

Upload jars

 dbfs cp --overwrite applicationinsights-core-2.6.1.jar dbfs:/databricks/appinsights/  
 dbfs cp --overwrite applicationinsights-logging-log4j1_2-2.6.1.jar dbfs:/databricks/appinsights/  
 dbfs cp --overwrite adbxmonitor_2.12-0.1.jar dbfs:/databricks/appinsights/  

Upload shell script

 dbfs cp --overwrite code/appinsights_logging_init.sh dbfs:/databricks/appinsights/appinsights_logging_init.sh  

Very files

 dbfs ls dbfs:/databricks/appinsights  

Step 6 – Create Databricks cluster

I am using Databricks Runtime 7.5 – (Spark 3.0.1 and Scala 2.12)

Clusters > Create cluster

Enable logging so that we can debug our init script errors.

Enter path – dbfs:/cluster-logs





Add init script path








Now confirm and create cluster.








Once the cluster is up and running, create scala notebook and run sample code

Get the instrumentation key that we set in init

 %sh echo $APPINSIGHTS_INSTRUMENTATIONKEY  

And push some log messages to Application Insight.

 import org.apache.log4j.LogManager  
 val log = LogManager.getRootLogger  
 log.warn('WARN: Hi from Databricks')  
 log.info('INFO: Hi from Databricks')  
 log.error('error: Hi from Databricks')  

It may take few minutes to reflect messages into Application Insights



Comments