Amazon Managed Workflows for Apache Airflow (Amazon MWAA) supplies a safe and managed atmosphere to run Apache Airflow on AWS. Airflow is usually utilized in extremely regulated industries, equivalent to finance and healthcare. These clients would possibly wish to additional prohibit entry and visitors to reinforce safety posture than what the Amazon MWAA default configurations present. This put up covers some advisable practices.
The precept of least privilege is a basic tenet that ought to be adopted diligently. Relating to configuring AWS providers, it’s important to grant solely the minimal required permissions to assets, avoiding overly broad or permissive insurance policies.
On this put up, we discover methods to apply the precept of least privilege to your Amazon MWAA atmosphere by tightening community safety utilizing safety teams, community entry management lists (ACLs), and digital personal cloud (VPC) endpoints. We additionally talk about the Amazon MWAA execution and deployment roles and their respective permissions.
Understanding the Amazon MWAA atmosphere
When an Amazon MWAA atmosphere is created, assets are created in an AWS managed service VPC and your buyer managed VPC. Within the buyer VPC offered at atmosphere creation, the required assets to run the Airflow atmosphere are deployed, together with schedulers and employees working on Amazon Elastic Container Service (Amazon ECS) clusters. These clusters are deployed in your VPC they usually assume Elastic Community Interfaces (ENIs) with personal IP addresses within the buyer account. These ENIs span personal subnets throughout two Availability Zones to connect with the Airflow database and internet server, which reside within the service-owned account (if in personal entry mode). The next diagram illustrates this structure.
VPC safety teams act as digital firewalls that may management community visitors on the ENI stage, or occasion stage. Safety teams are stateful, that means that inbound visitors is mechanically permitted outbound and vice versa. The default safety group configuration in a VPC begins with isn’t any inbound guidelines and an outbound rule permitting all visitors. By definition, a safety group with no inbound guidelines denies all ingress visitors that wasn’t allowed out by means of the 0.0.0.0/0 outbound rule.
Amazon MWAA provides two internet server entry modes contained in the buyer VPC: private and non-private. Public internet server mode should have a means for visitors to entry the online servers within the customer-owned VPC by means of the general public web. This requires routing to the general public web utilizing public subnets and a NAT gateway. A NAT gateway can be utilized to supply web entry for assets in personal subnets. With personal entry mode, the safety group for the Amazon MWAA atmosphere doesn’t want to permit visitors to and from the NAT gateway, solely granting entry to the Airflow UI to customers with applicable permissions from throughout the VPC. An Software Load Balancer is simply provisioned in public mode to route visitors to the general public internet servers. The shopper should provision the remainder of the networking parts.
In case your Amazon MWAA atmosphere wants to speak with assets outdoors your VPC (equivalent to exterior information sources or APIs), you would possibly must configure applicable safety group guidelines and routing to permit the required visitors. In such instances, you’ll sometimes use a NAT gateway or VPN connection to facilitate the communication between your Amazon MWAA atmosphere and the exterior assets and VPC endpoints for AWS assets.
For tighter safety restrictions, an atmosphere with personal routing with out web entry is feasible, and finer-grained safety group guidelines could be utilized and VPC endpoint insurance policies can be utilized. As a result of this put up is specializing in least privilege, we’ll give attention to the minimal safety necessities wanted for an Amazon MWAA atmosphere.
Safety teams: Minimizing permissions
Your Amazon MWAA atmosphere could have a safety group related along with your VPC’s atmosphere assets. This safety group can also be utilized by the ENIs created by the interface VPC endpoint that’s used to speak with the database and internet server. By default, safety teams deny all inbound visitors and safety group guidelines should be explicitly acknowledged, denoting the ports and supply that the occasion will permit community visitors from. At a minimal, the Amazon MWAA atmosphere should permit for visitors to and from the Amazon Aurora PostgreSQL-Appropriate Version metadata database that’s owned and managed by Amazon MWAA. The metadata database is a vital element of Airflow that acts as a centralized supply of reality for activity execution, configuration, and monitoring. Each the scheduler and employees require entry to this database to carry out their respective roles in orchestrating and working duties. This database listens on TCP port 5432. Moreover, the online server visitors could be restricted to HTTPS by means of TCP port 443. At a minimal, the Amazon MWAA safety group should have the 2 inbound guidelines, detailed within the following desk.
Sort
Protocol
Port Vary
Supply Sort
Supply
Customized TCP
TCP
5432
Customized
sg-xxxxx / my-mwaa-vpc-security-group
HTTPS
TCP
443
Customized
sg-xxxxx / my-mwaa-vpc-security-group
Many shoppers produce other AWS assets residing in VPCs, to which the Amazon MWAA employees want entry. These assets could be granted community entry in a personal routing configuration utilizing safety teams as nicely. If the useful resource sits in the identical safety group, add an extra inbound rule with the port wanted. For instance, if an Amazon Redshift cluster sits in the identical safety group, add the next rule.
Sort
Protocol
Port Vary
Supply Sort
Supply
Customized TCP
TCP
5439
Customized
sg-xxxxx / my-mwaa-vpc-security-group
If the Redshift cluster is in a unique safety group, change the supply to the Redshift safety group.
Sort
Protocol
Port Vary
Supply Sort
Supply
Customized TCP
TCP
5439
Customized
sg-xxxxx / redshift-security-group
If the assets are in one other VPC, then VPC peering have to be enabled earlier than referencing that different VPC’s safety group. For assets that don’t reside in a subnet, a VPC endpoint will even present personal routing to and from the Amazon MWAA atmosphere and people assets. For instance, a VPC endpoint for Amazon Easy Storage Service (Amazon S3) can present enhanced safety, improved efficiency, and decrease prices.
Community ACLs: Minimizing permissions
Community ACLs can handle (by permit or deny guidelines) inbound and outbound visitors on the subnet stage. An ACL is stateless, which signifies that inbound and outbound guidelines have to be specified individually and explicitly. It’s used to specify the varieties of community visitors which can be allowed in or out from the situations in a VPC community.
Each Amazon VPC has a default ACL that permits all inbound and outbound visitors, with a rule as follows.
Rule quantity
Sort
Protocol
Port Vary
Supply
Permit/Deny
100
All IPv4 visitors
All
All
0.0.0.0/0
Permit
*
All IPv4 visitors
All
All
0.0.0.0/0
Deny
You may edit the default ACL guidelines or create a customized ACL and fix it to your subnets. A subnet can solely have one ACL connected to it at any time, however one ACL could be connected to a number of subnets. To implement least privilege in your Amazon MWAA atmosphere, prohibit the inbound ACL to permit visitors from the metadata database and internet server and prohibit the outbound to permit visitors to solely the shoppers within the personal subnet. Notice the next examples use instance personal IPs for the subnets used.
Inbound NACL
Rule quantity
Sort
Protocol
Port Vary
Supply
Permit/Deny
Feedback
100
Customized TCP
TCP
5432
10.192.21.0/16
Permit
Permit inbound database visitors from personal subnet
110
HTTPS
TCP
443
10.192.21.0/16
Permit
Permit inbound HTTPS visitors from personal subnet
*
All visitors
All
All
0.0.0.0/0
Deny
Denies all inbound IPv4 visitors not already dealt with by a previous rule (not modifiable)
Outbound NACL
Rule quantity
Sort
Protocol
Port Vary
Supply
Permit/Deny
Feedback
100
Customized TCP
TCP
1024-65535
10.192.21.0/24
Permit
Permits outbound return IPv4 visitors to shoppers in personal subnet
*
All visitors
All
All
0.0.0.0/0
Deny
Denies all outbound IPv4 visitors not already dealt with by a previous rule (not modifiable)
VPC endpoints: Minimizing permissions
While you create an Amazon MWAA atmosphere, it’s deployed inside a VPC. This lets you management the community entry and safety of your Airflow deployment. Nevertheless, some buyer workloads executing within the Amazon MWAA atmosphere would possibly must orchestrate duties utilizing different AWS providers, equivalent to Amazon S3 to entry information, AWS Glue to begin ETL (extract, remodel, and cargo) jobs, or Amazon Redshift for working information warehouse queries, which reside outdoors of your VPC. To ascertain a safe and personal connection between your Amazon MWAA atmosphere and these exterior AWS providers, you should use VPC endpoints. The aim of VPC endpoints in Amazon MWAA is to supply a safe and personal connection between your Amazon MWAA atmosphere and different AWS providers inside your VPC. VPC endpoints are digital units which can be provisioned inside your VPC and act as an entry level for the required AWS service, permitting your Amazon MWAA atmosphere to speak with the service utilizing a personal IP deal with, without having to undergo the general public web. The next diagram illustrates this structure.
VPC endpoints let you hold your Amazon MWAA atmosphere’s community visitors throughout the AWS community, decreasing the publicity to the general public web and enhancing the general safety of your Airflow deployment. Though personal VPC endpoints are mechanically created for the database and internet server, to create a least privileged atmosphere with out web entry, extra VPC endpoints will probably be wanted for the extra Amazon MWAA required assets. Amazon S3, Amazon Easy Queue Service (Amazon SQS), Amazon CloudWatch, and optionally AWS Key Administration Service (AWS KMS) will want VPC endpoints created. For extra particulars, see Creating the required VPC service endpoints in an Amazon VPC with personal routing. Outdoors of the required providers, many purchasers run Amazon MWAA workflows that orchestrate extra AWS providers, equivalent to Amazon Redshift, Amazon EMR, and AWS Glue. Let’s take a look at an instance VPC endpoint that we wish to use to connect with Amazon Redshift, which is often referred to as within the Airflow DAGS utilizing the Redshift Operator for workflows that work together with Amazon Redshift as an information warehouse. For extra data on creating Amazon VPC interface endpoints, see Entry an AWS service utilizing an interface VPC endpoint.
Create a VPC endpoint
Full the next steps to create a VPC endpoint utilizing Amazon Digital Personal Cloud (Amazon VPC):
On the Amazon VPC console, create a brand new VPC endpoint for the amazonaws.area.redshift service, the place area is the AWS Area the place your Amazon MWAA atmosphere and Redshift cluster are positioned. Be sure that personal DNS is enabled.
Create a VPC endpoint coverage. This can be utilized to restrict entry to the Redshift cluster solely to the Amazon MWAA atmosphere, stopping unauthorized entry from different assets. The next is an instance coverage:
{
“Model”: “2012-10-17”,
“Assertion”: (
{
“Impact”: “Permit”,
“Principal”: {
“AWS”: (
“arn:aws:iam::123456789012:function/YourMWAAExecutionRoleName”
)
},
“Motion”: (
“redshift:DescribeClusters”,
“redshift:DescribeClusterParameters”,
“redshift:DescribeClusterSecurityGroups”,
“redshift:DescribeClusterSubnetGroups”,
“redshift:DescribeEventSubscriptions”,
“redshift:DescribeLoggingStatus”,
“redshift:DescribeReservedNodeOfferings”,
“redshift:DescribeReservedNodes”,
“redshift:DescribeTableRestoreStatus”,
“redshift:DescribeTags”,
“redshift:GetClusterCredentials”,
“redshift:ListTagsForResource”,
“redshift:PurchaseReservedNodeOffering”,
“redshift:ResetClusterParameterGroup”,
“redshift:RestoreFromClusterSnapshot”,
“redshift:RevokeClusterSecurityGroupIngress”,
“redshift:RevokeSnapshotAccess”,
“redshift:ViewQueriesInConsole”
),
“Useful resource”: “arn:aws:redshift:us-east-1:123456789012:cluster/my-redshift-cluster”
}
)
}
The coverage comprises the next parameters:
The Model area specifies the coverage language model.
The Assertion part comprises a single assertion that permits the required actions on the Redshift cluster.
The Impact area is about to Permit, which suggests the coverage grants the required permissions.
The Principal area specifies the AWS Id and Entry Administration (IAM) function related along with your Amazon MWAA execution function, which is permitted to entry the Redshift cluster.
The Motion area lists the particular Redshift actions that the Amazon MWAA execution function is allowed to carry out, equivalent to describing the cluster, getting cluster credentials, and restoring from a snapshot.
The Useful resource area specifies the Amazon Useful resource Identify (ARN) of the Redshift cluster that the coverage applies to.
Affiliate the VPC endpoint with the proper route desk. This route desk ought to be utilized by the subnets the place your Amazon MWAA atmosphere is deployed. If utilizing a VPC interface endpoint, affiliate the endpoint with the 2 personal subnets and safety group utilized by Amazon MWAA.
Be sure that the safety teams related to the Amazon MWAA atmosphere and the Redshift cluster permit the required inbound and outbound visitors between them. This sometimes consists of permitting entry on the Redshift port (sometimes 5439) from the Amazon MWAA atmosphere’s safety group.
On the Amazon MWAA console, beneath Admin, Connections, replace the Redshift connection particulars to make use of the VPC endpoint deal with as an alternative of the general public Redshift endpoint. This makes positive that the connection between Amazon MWAA and Amazon Redshift is safe and stays throughout the VPC.
By configuring VPC endpoints for the AWS providers your Amazon MWAA atmosphere must entry, you’ll be able to present safe, personal, and environment friendly communication between your Airflow deployment and AWS assets.
Proscribing visitors inside AWS with a buyer managed endpoints for Amazon MWAA assets
As talked about earlier, Amazon MWAA integrates with numerous AWS providers, equivalent to CloudWatch for logging, Amazon S3 for DAGs and necessities, Amazon SQS as a messaging middleware, and optionally AWS KMS for encryption. You may create VPC endpoints for these providers to verify visitors stays throughout the AWS community. Entry to those endpoints could be restricted by permitting solely the Amazon MWAA safety group because the ingress supply. For particulars on methods to create these endpoints and insurance policies, see Introducing shared VPC help on Amazon MWAA. If the Amazon MWAA atmosphere was up to date after April 2, 2024, will probably be on AWS Fargate v1.4 and won’t use Amazon Elastic Container Registry (Amazon ECR) and subsequently you’ll not must create a VPC endpoint for it.
Managing permissions to deploy an Amazon MWAA atmosphere
To create and deploy an Amazon MWAA atmosphere, you might want to have the suitable permissions granted to your IAM consumer or function. The required permissions could be granted by means of an IAM coverage connected to your consumer or function. While you create an Amazon MWAA atmosphere, you’ll be able to specify an execution function that will probably be assumed by the Airflow employees to carry out duties. The execution function ought to have the required permissions to entry the required AWS providers and assets based mostly in your workflow necessities. It’s necessary to observe the precept of least privilege when granting permissions to IAM roles and customers. You must solely grant the minimal permissions required on your Amazon MWAA atmosphere and Airflow workflows to operate appropriately.
Amazon MWAA belief coverage
Amazon MWAA wants to have the ability to assume the execution function with a view to carry out actions in your behalf. To do that, create a belief coverage, permitting the Amazon MWAA service the power to AssumeRole. To keep away from the confused deputy drawback, we add a situation to the belief coverage, and change the AWS account quantity and Area as wanted. The next is an instance coverage:
{
“Model”: “2012-10-17”,
“Assertion”: (
{
“Impact”: “Permit”,
“Principal”: {
“Service”: (“airflow.amazonaws.com”,”airflow-env.amazonaws.com”)
},
“Motion”: “sts:AssumeRole”,
“Situation”:{
“ArnLike”:{
“aws:SourceArn”:”arn:aws:airflow:your-region:123456789012:atmosphere/your-environment-name”
},
“StringEquals”:{
“aws:SourceAccount”:”123456789012″
}
}
}
)
}
VPC endpoint permissions for the deployer function
Though the service-linked function creates the VPC endpoints, the deployer function requires permissions to create VPC endpoints and carry out a dry run. You may restrict these permissions by permitting the ec2:CreateVpcEndpoint motion and specifying useful resource ARNs for VPC endpoints, VPCs, subnets, and safety teams. Moreover, you should use the aws:CalledVia situation key to limit entry to the airflow.amazonaws.com service.
Amazon MWAA execution function: Required permissions
When creating an Amazon MWAA atmosphere, you might want to specify an execution function that grants the required permissions for Airflow to work together with different AWS providers. As an alternative of utilizing a wildcard coverage, you’ll be able to create a customized coverage with the minimal required permissions.
The next is an instance of an execution function coverage that permits Amazon MWAA to work together with numerous providers utilizing an AWS managed key:
{
“Model”: “2012-10-17”,
“Assertion”: (
{
“Impact”: “Permit”,
“Motion”: “airflow:PublishMetrics”,
“Useful resource”: “arn:aws:airflow:{your-region}:{your-account-id}:atmosphere/{your-environment-name}”
},
{
“Impact”: “Deny”,
“Motion”: “s3:ListAllMyBuckets”,
“Useful resource”: (
“arn:aws:s3:::{your-s3-bucket-name}”,
“arn:aws:s3:::{your-s3-bucket-name}/*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“s3:GetObject*”,
“s3:GetBucket*”,
“s3:Checklist*”
),
“Useful resource”: (
“arn:aws:s3:::{your-s3-bucket-name}”,
“arn:aws:s3:::{your-s3-bucket-name}/*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“logs:CreateLogStream”,
“logs:CreateLogGroup”,
“logs:PutLogEvents”,
“logs:GetLogEvents”,
“logs:GetLogRecord”,
“logs:GetLogGroupFields”,
“logs:GetQueryResults”
),
“Useful resource”: (
“arn:aws:logs:{your-region}:{your-account-id}:log-group:airflow-{your-environment-name}-*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“logs:DescribeLogGroups”
),
“Useful resource”: (
“*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“s3:GetAccountPublicAccessBlock”
),
“Useful resource”: (
“*”
)
},
{
“Impact”: “Permit”,
“Motion”: “cloudwatch:PutMetricData”,
“Useful resource”: “*”
},
{
“Impact”: “Permit”,
“Motion”: (
“sqs:ChangeMessageVisibility”,
“sqs:DeleteMessage”,
“sqs:GetQueueAttributes”,
“sqs:GetQueueUrl”,
“sqs:ReceiveMessage”,
“sqs:SendMessage”
),
“Useful resource”: “arn:aws:sqs:{your-region}:*:airflow-celery-*”
},
{
“Impact”: “Permit”,
“Motion”: (
“kms:Decrypt”,
“kms:DescribeKey”,
“kms:GenerateDataKey*”,
“kms:Encrypt”
),
“Useful resource”: “arn:aws:kms:your-region:your-account-id:key/your-kms-cmk-id”,
“Situation”: {
“StringLike”: {
“kms:ViaService”: (
“sqs.{your-region}.amazonaws.com”,
“s3.{your-region}.amazonaws.com”
)
}
}
}
)
}
This coverage grants Amazon MWAA the required permissions to work together with CloudWatch Logs, Amazon S3, Amazon SQS, and AWS KMS when utilizing the AWS managed key providing, whereas explicitly specifying the assets it could possibly entry. You may additional refine this coverage based mostly in your particular necessities.
The next is an instance of an execution coverage that permits Amazon MWAA to work together with numerous providers utilizing a KMS buyer managed key:
{
“Model”: “2012-10-17”,
“Assertion”: (
{
“Impact”: “Deny”,
“Motion”: “s3:ListAllMyBuckets”,
“Useful resource”: (
“arn:aws:s3:::{your-s3-bucket-name}”,
“arn:aws:s3:::{your-s3-bucket-name}/*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“s3:GetObject*”,
“s3:GetBucket*”,
“s3:Checklist*”
),
“Useful resource”: (
“arn:aws:s3:::{your-s3-bucket-name}”,
“arn:aws:s3:::{your-s3-bucket-name}/*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“logs:CreateLogStream”,
“logs:CreateLogGroup”,
“logs:PutLogEvents”,
“logs:GetLogEvents”,
“logs:GetLogRecord”,
“logs:GetLogGroupFields”,
“logs:GetQueryResults”
),
“Useful resource”: (
“arn:aws:logs:{your-region}:{your-account-id}:log-group:airflow-{your-environment-name}-*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“logs:DescribeLogGroups”
),
“Useful resource”: (
“*”
)
},
{
“Impact”: “Permit”,
“Motion”: (
“s3:GetAccountPublicAccessBlock”
),
“Useful resource”: (
“*”
)
},
{
“Impact”: “Permit”,
“Motion”: “cloudwatch:PutMetricData”,
“Useful resource”: “*”
},
{
“Impact”: “Permit”,
“Motion”: (
“sqs:ChangeMessageVisibility”,
“sqs:DeleteMessage”,
“sqs:GetQueueAttributes”,
“sqs:GetQueueUrl”,
“sqs:ReceiveMessage”,
“sqs:SendMessage”
),
“Useful resource”: “arn:aws:sqs:{your-region}:*:airflow-celery-*”
},
{
“Impact”: “Permit”,
“Motion”: (
“kms:Decrypt”,
“kms:DescribeKey”,
“kms:GenerateDataKey*”,
“kms:Encrypt”
),
“Useful resource”: “arn:aws:kms:{your-region}:{your-account-id}:key/{your-kms-cmk-id}”,
“Situation”: {
“StringLike”: {
“kms:ViaService”: (
“sqs.{your-region}.amazonaws.com”,
“s3.{your-region}.amazonaws.com”
)
}
}
}
)
}
For the use case of utilizing the shopper managed key, connect the next JSON coverage to the important thing to supply entry to the Airflow logs in CloudWatch Logs:
{
“Sid”: “Permit logs entry”,
“Impact”: “Permit”,
“Principal”: {
“Service”: “logs.{your-region}.amazonaws.com”
},
“Motion”: (
“kms:Encrypt*”,
“kms:Decrypt*”,
“kms:ReEncrypt*”,
“kms:GenerateDataKey*”,
“kms:Describe*”
),
“Useful resource”: “*”,
“Situation”: {
“ArnLike”: {
“kms:EncryptionContext:aws:logs:arn”: “arn:aws:logs:{your-region}:{your-account-id}:*”
}
}
}
You may connect a number of insurance policies to the execution function as wanted to permit your employees to entry extra AWS assets. For instance, let’s discover methods to allow Amazon EMR entry. You may create a JSON coverage that comprises the narrowest permissions you’ll be able to configure, as within the following instance:
{
“Model”: “2012-10-17”,
“Assertion”: (
{
“Impact”: “Permit”,
“Motion”: (
“elasticmapreduce:DescribeStep”,
“elasticmapreduce:AddJobFlowSteps”,
“elasticmapreduce:RunJobFlow”
),
“Useful resource”: “arn:aws:elasticmapreduce:*:xxxxxxxxxxxx:cluster/*”
},
{
“Impact”: “Permit”,
“Motion”: “iam:PassRole”,
“Useful resource”: (
“arn:aws:iam::xxxxxxxxxxxx:function/EMR_EC2_DefaultRole”,
“arn:aws:iam::xxxxxxxxxxxx:function/EMR_DefaultRole”
)
}
)
}
Conclusion
On this put up, we mentioned greatest practices for least privilege configuration in Amazon MWAA. By following these approaches, you’ll be able to adhere to the precept of least privilege and preserve a safe posture inside your Amazon MWAA atmosphere, with out compromising performance or counting on overly permissive insurance policies. Safety is all the time high precedence; to be taught extra about safety in Amazon MWAA, see Safety in Amazon Managed Workflows for Apache Airflow and Safety greatest practices on Amazon MWAA.
In regards to the Authors
Elizabeth Davis is a Sr Options Architect at Amazon Net Companies (AWS). She at the moment works with academic know-how corporations and has a ardour for serverless and information orchestration applied sciences. She has been an Amazon MWAA as a topic professional (SME) for the final 3+ years.
Mark Richman is a Principal Options Architect at Amazon Net Companies with 30 years of expertise constructing complicated internet and enterprise software program. He contributes to Apache Airflow, bringing his experience in cloud computing and serverless applied sciences to the open-source platform. Mark can also be an completed author and speaker who has authored business publications and AWS programs whereas repeatedly presenting at business occasions.