From Fedora Project Wiki
(→‎Action Items: Add "Ask Amazon for support")
(→‎Implement and Document: Link to fedora-release, generic-release bugs)
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Problem Space =
== Problem Space ==


The user experience on Fedora VMs running on [http://aws.amazon.com/ec2/ Amazon EC2] would benefit from yum mirrors hosted within Amazon's cloud network.  In particular...
The user experience on Fedora VMs [[Features/EC2|running on Amazon EC2]] would benefit from yum mirrors hosted within Amazon's cloud network.  In particular...
* Such mirrors will be considerably faster.
* Such mirrors will be considerably faster.
* Data transfer charges will be reduced.
* Data transfer charges will be reduced.
** Intra-region S3-to-EC2 traffic is free.
** Intra-region S3-to-EC2 traffic is free.
** Intra-zone data transfer between EC2 instances is free.
** Intra-zone data transfer between EC2 instances is free.
* Users with hundreds of EC2 instances do not place additional load on existing public mirrors.
* Users with hundreds of EC2 instances will not place additional load on existing public mirrors.


= Solution Overview =
== Solution Overview ==


The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions.  Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably.  This is the solution Amazon uses for their newly-released [http://aws.amazon.com/amazon-linux-ami/ Amazon Linux] repositories.
The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions.  Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably.  This is the solution Amazon uses for their newly-released [http://aws.amazon.com/amazon-linux-ami/ Amazon Linux] repositories.


= AWS Credentials =
== AWS Credentials ==


Fedora needs an AWS account to use for managing these buckets.  For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access.  People most commonly use the keys for the account that pays for and manages the S3 bucket.  To minimize damage in case of compromise, however, each region will use a separate set of credentials.  We can do this with either per-region sub-accounts and [http://aws.amazon.com/about-aws/whats-new/2010/02/09/announcing-consolidated-billing-for-aws-accounts/ consolidated billing] or the new [http://aws.amazon.com/iam/ IAM service] and per-region, task-specific keys.
Fedora needs an AWS account to use for managing these buckets.  For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access.  People most commonly use the keys for the account that pays for and manages the S3 bucket.  To minimize damage in case of compromise, however, each region will use a separate set of per-region, task-specific keys  created with Amazon's [http://aws.amazon.com/iam/ IAM service].


= S3 Buckets =
== S3 Buckets ==


The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2.  Clients inside EC2 can then access yum repositories via region-specific URIs such as http://s3.amazonaws.com/fedora-mirror-us-west/fedora/linux/releases/13/Everything/x86_64/os/.
The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2.  Clients inside EC2 can then access yum repositories via region-specific URIs such as http://fedora-mirror-us-west-1.s3.amazonaws.com/fedora/linux/releases/13/Everything/x86_64/os/.


Since Amazon charges for data transfer from S3 buckets to the rest of the Internet these buckets will be accessible only by clients inside EC2.  S3's [http://docs.amazonwebservices.com/AmazonS3/index.html?RESTAPI.html REST API] allows one to create ACLs based on host IP addresses, so we will prevent outside access to these mirrors by allowing access only to EC2-internal IP addresses (the 10.x.x.x range).
Since Amazon charges for data transfer from S3 buckets to the rest of the Internet these buckets will be accessible only by clients inside EC2.  S3's [http://docs.amazonwebservices.com/AmazonS3/index.html?RESTAPI.html REST API] allows one to create ACLs based on host IP addresses, so we will prevent outside access to these mirrors by allowing access only to EC2-internal IP addresses (the 10.x.x.x range).
Line 24: Line 24:
{{admon/note|Question|Does the hostname in the above URI resolve to an IP address internal to EC2 when an EC2 instance attempts to resolve it?}}
{{admon/note|Question|Does the hostname in the above URI resolve to an IP address internal to EC2 when an EC2 instance attempts to resolve it?}}


= Client Access =
== Client Access ==


Yum needs to know which region a given client resides in so it can use the correct region's mirror.  We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile.  A running instance can query EC2 to discern which region it is located inside via its internal REST API.
Yum needs to know which region a given client resides in so it can use the correct region's mirror.  We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile.


Fedora's Infrastructure team recommends a two-part approach that avoids munging instances' yum configuration files:
While the VM images Fedora will provide are restricted to specific regions, encoding regions directly into these images presents two main difficulties:
* A yum plugin performs this query and informs MirrorManager of an instance's region via an additional mirrorlist flag.
* We have to spin images once for each region instead of using the same image globally.
* MirrorManager interprets this flag and prepends the URI of the appropriate mirror for the region, if any, to the list of mirror that it returns.
* Users can re-bundle their own versions of Fedora's stock images and start them in different regions, not only negating the benefits of this system for users, but also causing those who fund the mirrors to have to pay for data transfer.


= Updating S3 Mirrors =
A running instance can query EC2 to discern which region it is located inside via its internal API.  We can use this information either at boot time or whenever yum is called to ensure yum has up-to-date information as to where it resides.
 
Possible solutions to this problem were discussed at several meetings [http://meetbot.fedoraproject.org/fedora-meeting/2010-09-30/cloud.2010-09-30-21.01.log.html][http://meetbot.fedoraproject.org/fedora-meeting/2010-10-05/fesco.2010-10-05-19.30.log.html] and in [https://fedorahosted.org/rel-eng/ticket/4149 rel-eng ticket 4149].  The accepted solution follows:
 
Recent versions of yum replace variables like <code>$varname</code> in their configuration files with the contents of /etc/yum/vars/varname, as long as such a file exists.  At boot time an init script will grok the contents of http://169.254.169.254/latest/meta-data/placement/availability-zone (nonexistent outside EC2) and write an appropriate value to /etc/yum/vars/location.
 
Yum will then pass this to MirrorManager via an additional <code>location</code> flag that is referenced by appending <code>&location=$location</code> to the end of the metalink URIs in Fedora's stock repository files.  MirrorManager will then look up the value and prepend the relevant mirror(s), if any, to the mirror list it returns.  Bare metal machines will lack this file and pass <code>&location=$location</code>, verbatim, to MirrorManager, which will fail to find results for that value and return a standard mirror list.
 
The server-side code to accomplish this is present in MirrorManager's [http://git.fedorahosted.org/git/?p=mirrormanager;a=shortlog;h=refs/heads/v1.4 1.4 branch].  MirrorManager ignores parameters it does not recognize, so sending such a URI to a server that does not support this parameter  still results in a useful mirror list.
 
== Updating S3 Mirrors ==


S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best.  Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket.  This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials.
S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best.  Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket.  This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials.
Line 38: Line 48:
{{admon/note|Idea|Can Amazon's [http://aws.amazon.com/vpc/ VPC service] connect to Fedora's [[OpenVPN_Infrastructure_SOP|OpenVPN gateway]] so Release Engineering people can access them normally?}}
{{admon/note|Idea|Can Amazon's [http://aws.amazon.com/vpc/ VPC service] connect to Fedora's [[OpenVPN_Infrastructure_SOP|OpenVPN gateway]] so Release Engineering people can access them normally?}}


= Action Items =
== Action Items ==


== Finalize this Proposal ==
=== Finalize this Proposal ===


* Decide whether to use IAM or AWS sub-accounts.
* <del>Decide whether to use IAM or AWS sub-accounts.</del>
* Decide who will manage "official" Fedora AWS credentials.
* <del>Decide who will manage "official" Fedora AWS credentials.</del>
* Decide whether to run the S3 bucket population script on Fedora servers or EC2 instances.
* Decide whether to run the S3 bucket population script on Fedora servers or EC2 instances.
* Decide how these scripts and possibly EC2 instances will be managed.  (Involve Infrastructure in the discussion.)
* Decide how these scripts and possibly EC2 instances will be managed.  (Involve Infrastructure in the discussion.)


== Implement and Document ==
=== Implement and Document ===


* Ask Amazon officials what support/subsidies they can provide for our finalized proposal.
* Ask Amazon officials what support/subsidies they can provide for our finalized proposal.
* Reserve appropriately-named S3 buckets for Fedora's yum mirrors in each AWS region well in advance of several releases.
* Reserve appropriately-named S3 buckets for Fedora's yum mirrors in each AWS region.
* Add appropriate ACLs to these S3 buckets.
* Add appropriate ACLs to these S3 buckets.
* Write yum-plugin-aws.
* Add <code>&location=$location</code> to stock yum repo files.  (See bugs [https://bugzilla.redhat.com/show_bug.cgi?id=643185 643185] and [https://bugzilla.redhat.com/show_bug.cgi?id=643186 643186])
* Add AWS region flag support to MirrorManager.
* <del>Add AWS region flag support to MirrorManager.</del>
* Document and script repository population and updating.
* Document and script repository population and updating.
* Document when and how to retire S3-based yum mirrors of old releases.
* Document when and how to retire S3-based yum mirrors of old releases.

Latest revision as of 20:03, 1 November 2010

Problem Space

The user experience on Fedora VMs running on Amazon EC2 would benefit from yum mirrors hosted within Amazon's cloud network. In particular...

  • Such mirrors will be considerably faster.
  • Data transfer charges will be reduced.
    • Intra-region S3-to-EC2 traffic is free.
    • Intra-zone data transfer between EC2 instances is free.
  • Users with hundreds of EC2 instances will not place additional load on existing public mirrors.

Solution Overview

The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions. Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably. This is the solution Amazon uses for their newly-released Amazon Linux repositories.

AWS Credentials

Fedora needs an AWS account to use for managing these buckets. For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access. People most commonly use the keys for the account that pays for and manages the S3 bucket. To minimize damage in case of compromise, however, each region will use a separate set of per-region, task-specific keys created with Amazon's IAM service.

S3 Buckets

The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2. Clients inside EC2 can then access yum repositories via region-specific URIs such as http://fedora-mirror-us-west-1.s3.amazonaws.com/fedora/linux/releases/13/Everything/x86_64/os/.

Since Amazon charges for data transfer from S3 buckets to the rest of the Internet these buckets will be accessible only by clients inside EC2. S3's REST API allows one to create ACLs based on host IP addresses, so we will prevent outside access to these mirrors by allowing access only to EC2-internal IP addresses (the 10.x.x.x range).

Note.png
Question
Does the hostname in the above URI resolve to an IP address internal to EC2 when an EC2 instance attempts to resolve it?

Client Access

Yum needs to know which region a given client resides in so it can use the correct region's mirror. We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile.

While the VM images Fedora will provide are restricted to specific regions, encoding regions directly into these images presents two main difficulties:

  • We have to spin images once for each region instead of using the same image globally.
  • Users can re-bundle their own versions of Fedora's stock images and start them in different regions, not only negating the benefits of this system for users, but also causing those who fund the mirrors to have to pay for data transfer.

A running instance can query EC2 to discern which region it is located inside via its internal API. We can use this information either at boot time or whenever yum is called to ensure yum has up-to-date information as to where it resides.

Possible solutions to this problem were discussed at several meetings [1][2] and in rel-eng ticket 4149. The accepted solution follows:

Recent versions of yum replace variables like $varname in their configuration files with the contents of /etc/yum/vars/varname, as long as such a file exists. At boot time an init script will grok the contents of http://169.254.169.254/latest/meta-data/placement/availability-zone (nonexistent outside EC2) and write an appropriate value to /etc/yum/vars/location.

Yum will then pass this to MirrorManager via an additional location flag that is referenced by appending &location=$location to the end of the metalink URIs in Fedora's stock repository files. MirrorManager will then look up the value and prepend the relevant mirror(s), if any, to the mirror list it returns. Bare metal machines will lack this file and pass &location=$location, verbatim, to MirrorManager, which will fail to find results for that value and return a standard mirror list.

The server-side code to accomplish this is present in MirrorManager's 1.4 branch. MirrorManager ignores parameters it does not recognize, so sending such a URI to a server that does not support this parameter still results in a useful mirror list.

Updating S3 Mirrors

S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best. Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket. This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials.

Note.png
Idea
Can Amazon's VPC service connect to Fedora's OpenVPN gateway so Release Engineering people can access them normally?

Action Items

Finalize this Proposal

  • Decide whether to use IAM or AWS sub-accounts.
  • Decide who will manage "official" Fedora AWS credentials.
  • Decide whether to run the S3 bucket population script on Fedora servers or EC2 instances.
  • Decide how these scripts and possibly EC2 instances will be managed. (Involve Infrastructure in the discussion.)

Implement and Document

  • Ask Amazon officials what support/subsidies they can provide for our finalized proposal.
  • Reserve appropriately-named S3 buckets for Fedora's yum mirrors in each AWS region.
  • Add appropriate ACLs to these S3 buckets.
  • Add &location=$location to stock yum repo files. (See bugs 643185 and 643186)
  • Add AWS region flag support to MirrorManager.
  • Document and script repository population and updating.
  • Document when and how to retire S3-based yum mirrors of old releases.