Infrastructure/Mirroring/ru

Проект Fedora имеет более 200 зеркал по всему миру, которые помогают распространять программное обеспечение Fedora пользователям. Мы высоко ценим наши зеркала и их системных администраторов.

Общение

 * Списки рассылки: mirror-list и mirror-list-d  (обсуждение)
 * IRC:  на Freenode
 * Административные изменения: послать письмо на

Какой объем дискового пространства потребуется?

 * http://download.fedora.redhat.com/pub/DIRECTORY_SIZES.txt

Как стать общественным зеркалом?
Стать общественным зеркалом легко и становится все проще. Все чего мы просим, это достаточная ширина канала и достаточное количество дискового пространства для управления загрузкой. Каждый выпуск Fedora может занимать до 200 Гб (гигабайт) дискового пространства, и пользователи, загружающие дистрибутив, могут требовать как можно большее широкий канал. Зеркало должно иметь по меньшей мере канал в 100 Мбит/сек, лучше если это будет канал в 1 Гбит/сек или более широкий. Для выпуска Fedora 8, полный объем затраченного дискового пространства на главном сервере составлял 1.1 Тб (терабайт) и постоянно растет. Том объёма 1-2 Тб более подходит для долговременной поддержки зеркала. This content is hardlinked; if you can't hardlink (e.g. you're on AFS), you'll need much more disk space. Требуемый в настоящее время объем дискового пространства указан в.


 * 100Mbit/sec is the rule for countries with adequate mirror coverage already. We can make exceptions for new mirrors in countries that have few mirrors.  Connections to Internet2, National Lambda Rail, GEANET2, RedIRIS, or other such high speed research and educational networks are always appreciated.

Как можно создать частное зеркало?
Частные зеркала - это зеркала, которые находятся в пределах некоторой организации (компании, школы и т.д.) и могут быть доступны только членам этой организации. Они предназначены для ускорения распространения Fedora в пределах организации, где локальный трафик гораздо дешевле чем интернет трафик.

Частные зеркала подобны общественным зеркалам, за некоторым исключением: You may also find it more beneficial to run an IntelligentMirror instead of a full rsync mirror. In this way, only the updates your local users actually need will be cached on your local mirror, saving you the bandwidth from downloading updates you don't actually need.
 * Private mirrors are never listed in the MirrorManager publiclist pages.
 * Private mirrors cannot pull from the master Fedora download servers. They must pull from another listed public mirror.
 * Private mirrors should include IP netblocks in their MirrorManager configuration. This allows your network-local users to be automatically redirected to your mirror.  You may list IP netblocks (e.g. 18.0.0.0/8), or if your network is NAT'd, the hostname of your NAT gateway.
 * Private mirrors are not crawled by the MirrorManager web crawler. As a corollary:
 * Private mirrors must run report_mirror to inform the MirrorManager database of their content. If you don't run report_mirror, your clients will not be automatically redirected.

MirrorManager: the Fedora Mirror Management system
The MirrorManager software keeps track of all the mirrors without requiring a lot of manual text file editing.

Fedora Account System

 * You must have an account in the Fedora Account System . (More info also at  .) You are not required to sign the Contributors License Agreement to merely mirror Fedora content, but you must do so if you wish to contribute to other aspects of Fedora.
 * You must send an email to  stating you would like to become a mirror, your IP address, your location (country), and your outbound bandwidth available for the mirror.
 * You must subscribe to mirror-list and mirror-list-d  (discussion) to be notified of new releases.  Please send the note above to   before subscribing, so we know who you are and can approve your subscription.  Private mirrors need only introduce themselves so they may be approved onto the mailing list.  The other details (IP, country, bandwidth) don't matter for private mirrors.

Registering in MirrorManager
$ dig txt 1.1.166.143.asn.routeviews.org @archive.routeviews.org ;; ANSWER SECTION: 1.1.166.143.asn.routeviews.org. 86400 IN TXT	"3614" "143.166.0.0" "16" Here, the answer is in the TXT record, the first value, 3614.
 * Log into mirrormanager using your FAS account.
 * Create a new Site.
 * create a new Host, and sign up that host for the Categories of content you'll carry, any other site administrators you want, your site's IP addresses used for our Access Control List, and the other details listed there if applicable to you.
 * Please run  after each rsync run.
 * You may list your site's IP address ranges (Netblocks). Clients coming from an IP address within your netblock will be automatically redirected to your mirror for any content you carry.
 * You may list your site's BGP Autonomous System Number (ASN). Clients on your ASN will be automatically redirected to your mirror for any content you carry.  One way to lookup up your ASN is to query it from the routeviews.org DNS servers.  It is like a PTR record lookup, but at a specific server.  For example, to look up 143.166.1.1, type:

Mirroring
The only sane way to do mirroring is to use. Note the options  (hardlinks), ,   and   are required to ensure your mirror content stays valid even during a new rsync run, until all the new data is available.

rsync -vaH --exclude-from=${EXCLUDES} --numeric-ids --delete --delete-after --delay-updates \ rsync://download.fedora.redhat.com/fedora-enchilada ${LOCAL_DIR}


 * You may exclude any content you desire, such as architectures, using an EXCLUDES file.


 * Please pull from one of the Tier 1 mirrors. See Infrastructure/Mirroring/Tiering .  Instead of using one of the Tier 1 servers, you may wish to pull from another fast mirror that's closer to you.  Contact the respective mirror admins to be added to their ACL.


 * You should sync shortly after 0800 UTC (when rawhide is pushed), 1400 UTC (when bitflips occur), and another 3-5 times per day (updates are manually released).


 * If you are using rsync 3.0 or higher, you can use the  option instead of , which is reported to provide faster performance.

Running report_mirror
MirrorManager includes a tool,  which can upload to the mirror database that you completed a run and what content you've got. This makes generating the yum mirrorlists and all other pages much much simpler. Please run  after every rsync job completes.

yum install mirrormanager-client

or get the files directly from the git tree report_mirror files. Or it can be obtained using git:

git clone git://git.fedorahosted.org/git/mirrormanager or git clone http://git.fedorahosted.org/git/mirrormanager/

You need both report_mirror and report_mirror.conf, and must edit report_mirror.conf to include the content you're carrying and the path to that content on your disk.

Available content
The available content modules by rsync, and their point in the directory tree are:

Fedora Secondary Architectures
Secondary architectures content is not hosted on the master mirror servers, but instead on a different machine, secondary.fedoraproject.org. Should you wish to mirror that content, please do so at a directory such as /pub/fedora-secondary. Do not put this content into the same directory structure as the other Fedora content.. The MirrorManager Category for this content is 'Fedora Secondary Arches'.

Fedora Additional Content
Additional freely distributable content of a variety of types will be hosted at http://alt.fedoraproject.org/pub/alt/. Should you wish to mirror that content, please do so at a directory such as /pub/alt. Do not put this content into the same directory structure as the other Fedora content.. The MirrorManager Category for this content is 'Fedora Other'.

Fedora Core and Extras for FC6 and earlier
Historical content is now hosted on rsync://archive.fedoraproject.org/fedora-archive. Content here is updated once every 6 months, so please don't sync it multiple times a day.

Please use the above paths as subpaths on your own mirror servers, omitting 'pub' if necessary. We recommend using fedora-enchilada plus fedora-epel if you can. This ensures you have the same directory structure as the master servers, which makes it far easier for users to find the content on your mirror.

DVDs, CDs, and the exploded trees
When a new release is available, it can be bandwidth-efficient to download only the ISOs first (say, the DVD ISOs), then explode those into the directory structure, then run a full normal rsync run. This lets you avoid downloading the same RPMs twice (both on ISOs and as plain RPMs). There's a tool somewhere to help do this.

Regular hardlink runs
While the Fedora release maintainers try to keep as little redundant packaging around as possible, there are some duplicate packages in the tree. For example, when a Fedora Test release comes out, the package set included there looks remarkably like that of the development tree from a few days before. By copying the development tree over into the new Test directory before starting your rsync run, and using, you can avoid downloading all that content a second time.

In addition, it's good practice to run a tool like  on your tree occasionally (say, weekly), to ensure as much of your tree as possible is hardlinked.

Pre-Release: Copying Development tree to new release directory
In the days leading up to a release, either test or final, the development (aka rawhide) tree will stop taking new packages, and will closely resemble what winds up in the new release. As a mirror, you can avoid downloading content that already is in your copy of the development tree that matches what's in the release tree by copying those packages using hardlinks, such as: cp -lr fedora/linux/development/i386 fedora/linux/test/11-Preview/Fedora/ cp -lr fedora/linux/development/x86_64 fedora/linux/test/11-Preview/Fedora/ cp -lr fedora/linux/development/source fedora/linux/test/11-Preview/Fedora/

and then start the rsync process, which will clean up any changes and fix up the timestamps.

Rsync Configuration (sample)
Larger mirrors, like kernel.org, have slightly custom front-ends to rsync (mainly so that they can have a single rsync instance and have multiple ip based vhost configuration files) That said what follows is a sample rsync configuration file for public syncing (this is not intended for private pre-bitflip mirroring)

[fedora] comment        = Fedora - RedHat community project path           =  exclude        = lost+found/ read only      = true max connections = 100 lock file      = /var/run/rsyncd-mirrors.lock uid            =  gid            =  transfer logging = yes timeout        = 900 ignore nonreadable = yes dont compress  = *.gz *.tgz *.zip *.z *.Z *.rpm *.deb *.bz2 refuse options = checksum

Things to explicitly note:
 * The path above should be a full path to your fedora directory
 * You should *really* want to leave this read-only
 * Make sure your uid/gid are set to public users, not to the user that you run as your sync agent. If you set this to the user who does your syncs you will be inadvertently giving the public full pre-bitflip access.
 * Make sure you have the 'refuse options' set to checksum, your server will be *MUCH* happier with this set, as it will prevent public users from performing a checksum run against you. This can be incredibly I/O abusive, so should not be available to the general public.

Keepalives
HTTP Keepalives should be enabled on your mirror server to speed up client downloads. By default, Fedora's Apache httpd package has keepalives disabled. They should be enabled, with a timeout of at least 2 seconds (the default of 15 seconds might be too high for a heavily loaded mirror server, but 2 seconds is sufficient and appropriate for yum).

KeepAlive On KeepAliveTimeout 2 MaxKeepAliveRequests 100 Other http servers such as lighttpd have keepalives enabled by default.

Caching of metadata
We don't want caching proxy servers between our mirrors and our end user systems to cache our yum repository metadata. So, add explicit metadata handling. (Suggested by the OpenSUSE download redirector.)

 Header set Cache-Control "must-revalidate" ExpiresActive On      ExpiresDefault "now" 

Redirecting ISO downloads to FTP
Apache 2.x and earlier can't distribute files larger than 2GB. This means DVD images won't work. (lighttpd doesn't have this limitation). Also, some people find FTP to be more efficient than HTTP for really large files like ISO images. These Rewrite lines will redirect all HTTP GET requests for *.iso files to a different FTP daemon. With this method, HEAD requests used by the MirrorManager crawler for *.iso files aren't rewritten, which gives better crawling results.

RewriteCond    %{REQUEST_METHOD} GET RewriteRule    ^(.*\.iso)$ ftp://myserver/$1  [L,R=301]

Content Types
ISO and RPM files should be served using MIME Content-Type: application/octet-stream. In Apache, this can be done inside a VirtualHost or similar section:

 AddType application/octet-stream .iso AddType application/octet-stream .rpm 

Limiting Download Accelerators
Download accelerators will try to open the same file many times, and request chunks, hoping to download them in parallel. This can overload heavily loaded mirror servers, especially on release day. Here are some tricks to thwart such activities.

To limit connections to ISO dirs by some amount per IP:

 MaxConnPerIP 6 

To block ranged requests as this is what download accelerators do indeed:

RewriteEngine on RewriteCond %{HTTP:Range} [0-9] $ RewriteRule \.iso$ / [F,L]

Similar things can be done with iptables and the recent module, which might give you a little more ability to control what is being done, either by limiting new connections or by dropping 50% of a users packets.

Logging Partial Content Downloads
Partial content can be logged correctly using apache:


 * 1) this includes actual counts of actual bytes received (%I) and
 * 2) sent (%O); this requires the mod_logio module to be loaded.

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" %I %O \"%{User-Agent}i\"" combined

Pre-bitflip mirroring
Several days before each public release, the content will be staged to the master mirror servers, but with restricted permissions on the directories (generally mode 0750), specifically, not world readable.

Mirror servers should have several different user/group accounts on their server, for running the different public services. Typically you find:
 * HTTP server runs as user apache, group apache
 * FTP server runs as user ftp, group ftp
 * RSYNC server runs as user rsync, group rsync
 * a user account for downloading content from the masters (e.g. user mirror, group mirror).

The user account used to download content from the masters must be not be the same as the HTTP, FTP, or RSYNC server accounts. This guarantees that content downloaded with permissions 0750 will not be made available via your public servers yet.

On the morning of the public release, the permissions on the directories on the master servers will change to 0755 - world readable. This is called the bitflip.

Mirrors may either rsync one more time to pick up these new permissions (but won't have to download all the data again), or preferably, can schedule a batch job to bitflip:

$ echo "chmod a+rx /pub/fedora/linux/releases/9" | at '14:45 UTC May 13 2008'

Serving content to other mirrors
Tier 1 mirrors will necessarily need to share content to Tier 2 mirrors before the bitflip. This is done by running another instance of the rsync daemon, on a different port (e.g. 874), with an Access Control List to prevent public downloads, running as a user in the same group as downloaded the content (e.g. group mirror). This could be user mirror, group mirror, who has group read/execute permissions on the still-private content.

Tier 1 mirrors have a tendency to use different authentication methods for granting access to these non-public downloads, they vary from maintaining IP based ACL's to assigning username/password combinations to mirrors wishing to sync from them. Each method has advantages / disadvantages, the IP list is 'simpler' from a mirrormanager perspective as mirrormanager can give you the list of IP's but from an automation standpoint can be more difficult (as rsync's configuration file does not allow that ACL list to be stored in a separate file). Username / passwords can be more versatile as sites mirroring can change IPs without notifying you, but it's easier for those credentials to leak out and get miss-used.

Map
Worldwide public mirrors as of 23-May-2009.

Recognition
This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/.