User:Gholms/EC2 Mirror Proposal

From FedoraProject

Jump to: navigation, search

Contents

Problem Space

The user experience on Fedora VMs running on Amazon EC2 would benefit from yum mirrors hosted within Amazon's cloud network. In particular...

Solution Overview

The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions. Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably. This is the solution Amazon uses for their newly-released Amazon Linux repositories.

AWS Credentials

Fedora needs an AWS account to use for managing these buckets. For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access. People most commonly use the keys for the account that pays for and manages the S3 bucket. To minimize damage in case of compromise, however, each region will use a separate set of per-region, task-specific keys created with Amazon's IAM service.

S3 Buckets

The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2. Clients inside EC2 can then access yum repositories via region-specific URIs such as http://fedora-mirror-us-west-1.s3.amazonaws.com/fedora/linux/releases/13/Everything/x86_64/os/.

Since Amazon charges for data transfer from S3 buckets to the rest of the Internet these buckets will be accessible only by clients inside EC2. S3's REST API allows one to create ACLs based on host IP addresses, so we will prevent outside access to these mirrors by allowing access only to EC2-internal IP addresses (the 10.x.x.x range).

Note.png
Question
Does the hostname in the above URI resolve to an IP address internal to EC2 when an EC2 instance attempts to resolve it?

Client Access

Yum needs to know which region a given client resides in so it can use the correct region's mirror. We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile.

While the VM images Fedora will provide are restricted to specific regions, encoding regions directly into these images presents two main difficulties:

A running instance can query EC2 to discern which region it is located inside via its internal API. We can use this information either at boot time or whenever yum is called to ensure yum has up-to-date information as to where it resides.

Possible solutions to this problem were discussed at several meetings [1][2] and in rel-eng ticket 4149. The accepted solution follows:

Recent versions of yum replace variables like $varname in their configuration files with the contents of /etc/yum/vars/varname, as long as such a file exists. At boot time an init script will grok the contents of http://169.254.169.254/latest/meta-data/placement/availability-zone (nonexistent outside EC2) and write an appropriate value to /etc/yum/vars/location.

Yum will then pass this to MirrorManager via an additional location flag that is referenced by appending &location=$location to the end of the metalink URIs in Fedora's stock repository files. MirrorManager will then look up the value and prepend the relevant mirror(s), if any, to the mirror list it returns. Bare metal machines will lack this file and pass &location=$location, verbatim, to MirrorManager, which will fail to find results for that value and return a standard mirror list.

The server-side code to accomplish this is present in MirrorManager's 1.4 branch. MirrorManager ignores parameters it does not recognize, so sending such a URI to a server that does not support this parameter still results in a useful mirror list.

Updating S3 Mirrors

S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best. Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket. This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials.

Note.png
Idea
Can Amazon's VPC service connect to Fedora's OpenVPN gateway so Release Engineering people can access them normally?

Action Items

Finalize this Proposal

Implement and Document