Connect Amazon EMR to Digital Tap AI
1 Prerequisites
- AWS credentials — Access Key + Secret Key, or IAM Role (recommended)
- AWS Region where your EMR clusters run
- Digital Tap AI account — sign up free
2 Required IAM Permissions
Create an IAM policy with these permissions. This is the minimum required for Digital Tap AI to discover and optimize your EMR clusters:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DigitalTapEMRRead",
"Effect": "Allow",
"Action": [
"elasticmapreduce:ListClusters",
"elasticmapreduce:DescribeCluster",
"elasticmapreduce:ListInstances",
"elasticmapreduce:ListInstanceGroups",
"elasticmapreduce:ListSteps",
"elasticmapreduce:DescribeStep",
"elasticmapreduce:ListBootstrapActions"
],
"Resource": "*"
},
{
"Sid": "DigitalTapEMRManage",
"Effect": "Allow",
"Action": [
"elasticmapreduce:SetTerminationProtection",
"elasticmapreduce:TerminateJobFlows",
"elasticmapreduce:ModifyInstanceGroups",
"elasticmapreduce:PutAutoScalingPolicy"
],
"Resource": "*"
},
{
"Sid": "DigitalTapCloudWatch",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
],
"Resource": "*"
},
{
"Sid": "DigitalTapCostExplorer",
"Effect": "Allow",
"Action": [
"ce:GetCostAndUsage",
"ce:GetCostForecast"
],
"Resource": "*"
}
]
}
💡 Least privilege: For monitor-only mode, remove the
DigitalTapEMRManage statement. The agent will detect idle clusters and generate recommendations without taking action.
3 Install the Agent
Option A: Docker (with Access Keys)
docker run -d \
--name digitaltap-agent \
--restart unless-stopped \
-e DT_API_KEY="your-digital-tap-api-key" \
-e DT_PLATFORM="emr" \
-e AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE" \
-e AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLE" \
-e AWS_DEFAULT_REGION="us-east-1" \
ghcr.io/digital-tap/agent:latest
Option B: Docker (with IAM Role — recommended for EC2)
# Attach the IAM role to your EC2 instance, then:
docker run -d \
--name digitaltap-agent \
--restart unless-stopped \
-e DT_API_KEY="your-digital-tap-api-key" \
-e DT_PLATFORM="emr" \
-e AWS_DEFAULT_REGION="us-east-1" \
ghcr.io/digital-tap/agent:latest
Option C: Helm (Kubernetes with IRSA)
helm repo add digitaltap https://charts.digitaltap.ai
helm repo update
helm install digitaltap-agent digitaltap/agent \
--set apiKey="your-digital-tap-api-key" \
--set platform="emr" \
--set aws.region="us-east-1" \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::123456789012:role/digitaltap-role" \
--namespace digitaltap --create-namespace
4 Verify Connection
- Open your Digital Tap AI dashboard
- Navigate to Integrations → Connected Platforms
- Your EMR clusters should appear within 3-5 minutes
5 EMR-Specific Features
- Idle Cluster Detection — Finds EMR clusters with no running steps and low HDFS/YARN utilization
- Auto-Termination — Terminates truly idle transient clusters to stop the meter
- Instance Group Optimization — Right-sizes core and task instance groups based on actual usage
- Spot Fleet Management — Optimizes spot vs on-demand mix in task groups
- Step Optimization — Analyzes Spark step performance and recommends config improvements
- Bootstrap Action Audit — Flags slow or redundant bootstrap actions adding startup time
- Cost Forecasting — Projects EMR spend by cluster, team, and workload type
6 Troubleshooting
No clusters found
- Verify
AWS_DEFAULT_REGIONmatches where your clusters run - For multi-region, set
DT_AWS_REGIONS="us-east-1,us-west-2,eu-west-1" - Ensure IAM permissions include
elasticmapreduce:ListClusters
Authentication errors
- Verify credentials:
aws sts get-caller-identity - If using IAM roles, ensure the instance profile is attached