Features/LessFS

From FedoraProject

< Features(Difference between revisions)
Jump to: navigation, search
(This should go in the "incomplete" category unless it has someone working on it. Read Features/Policy for more information on how this works.)
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{admon/important | Comments and Explanations | The page source contains comments providing guidance to fill out each section.  They are invisible when viewing this page.  To read it, choose the "edit" link.<br/> '''Copy the source to a ''new page'' before making changes!  DO NOT EDIT THIS TEMPLATE FOR YOUR FEATURE.'''}}
+
= LessFS =
  
{{admon/important | Set a Page Watch| Make sure you click ''watch'' on your new page so that you are notified of changes to it by others, including the Feature Wrangler}}
+
http://www.lessfs.com/
 
+
{{admon/note | All sections of this template are required for review by FESCo. If any sections are empty it will not be reviewed }}
+
 
+
 
+
<!-- All fields on this form are required to be accepted by FESCo.
+
We also request that you maintain the same order of sections so that all of the feature pages are uniform.  -->
+
 
+
<!-- The actual name of your feature page should look something like: Features/YourFeatureName.  This keeps all features in the same namespace -->
+
 
+
= Feature Name =
+
LessFS
+
  
 
== Summary ==
 
== Summary ==
LessFS is a filesystem deduplication project.  The aim is to reduce disk usage where filesystem blocks are identical by only storing 1 block and using pointers to the original block for copies.  This method of storage is becoming popular in Enterprise solutions for reducing disk backups and minimising virtual machine storage in particular.
+
LessFS is a [http://en.wikipedia.org/wiki/Data_deduplication data deduplication] project.  The aim is to reduce disk usage where filesystem blocks are identical by only storing 1 block and using pointers to the original block for copies.  This method of storage is becoming popular in Enterprise solutions for reducing disk backups and minimising virtual machine storage in particular.
  
 
== Owner ==
 
== Owner ==
<!--This should link to your home wiki page so we know who you are-->
 
 
* Name: [[User:drunkahol| Duncan Innes]]
 
* Name: [[User:drunkahol| Duncan Innes]]
  
Line 24: Line 12:
  
 
== Current status ==
 
== Current status ==
* Targeted release: ?
+
* Targeted release: [[Releases/15 | Fedora 15]]
 
* Last updated: 2010-11-12
 
* Last updated: 2010-11-12
 
* Percentage of completion: 0%
 
* Percentage of completion: 0%
  
<!-- CHANGE THE "FedoraVersion" TEMPLATES ABOVE TO PLAIN NUMBERS WHEN YOU COMPLETE YOUR PAGE. -->
+
Looks like this is going to die as the RPM build doesn't look like it's going anywhere. Should I remove this as a target for [[Releases/15 | Fedora 15]]? If so, how do I do that?
  
 
== Detailed Description ==
 
== Detailed Description ==
<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
+
Data deduplication is often used for backup purposes and for virtual machine image storage. lessfs can determine if data is redundant by calculating an unique (192 bit) tiger hash of each block of data that is written. When lessfs has determined that a block of data needs to be stored it first compresses the block with LZO or QUICKLZ compression. The combination of these two techniques results in a very high overall compression rate for many types of data. Multimedia files like mp3, avi or jpg files can not be compressed by lessfs when they are only stored once on the filesystem.
 +
 
 +
http://www.lessfs.com/wordpress/?page_id=50
  
 
== Benefit to Fedora ==
 
== Benefit to Fedora ==
This will bring an as yet unavailable enterprise tool to Fedora.  Storage is becoming the biggest consumer of energy in the datacentre.  De-duplication will help bring that power and cost requirement down.
+
This will bring an as yet unavailable enterprise tool to Fedora.  Storage is becoming the biggest consumer of energy in the datacentre.  De-duplication will help bring that power and cost requirement down.  Inclusion of LessFS (even as a technology preview) will improve the coverage of Fedora and help to push forward an open source method of de-duplication.
  
 
== Scope ==
 
== Scope ==
Attempts have been made to package as an RPM but seem to have stalled.
+
LessFS adds functionallity that allows deduped file systems.  The project is under current development and there are regular and frequent releases.
  
 
== How To Test ==
 
== How To Test ==
<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this feature is expected to pass when it is done. If it needs to be tested with different hardware or software configurations, indicate them. The more specific you can be, the better the community testing can be.  
+
No special hardware requirements.
 +
 
 +
A Package Review Request is currently sitting in Bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=530473) but appears to have stalled.
 +
 
 +
Once the package is installed, a filesystem can then be created. 
 +
 
 +
Example:
 +
 
 +
Create a filesystem /data/orig as a normal partition.
 +
Create a filesystem /data/less as a de-duplicated fuse filesystem using LessFS.
  
Remember that you are writing this how to for interested testers to use to check out your feature - documenting what you do for testing is OK, but it's much better to document what *I* can do to test your feature.
+
Create a directory & file structure in /data/orig that uses multiple copies of a few large files.  Renamed file copies in the same directory and same-name copies in different directories.  Files should be multiple blocks in size for optimum testing.  Data can be from /dev/random or similar to allow good LZ compression.  Once the /data/orig filesystem is of a good size for testing (multiple Gb will be better, but not entirely necessary) copy all the data to /data/less.
  
A good "how to test" should answer these four questions:
+
An rsync should show that the /data/orig and /data/less filesystems are identical, but checking the /data/less directory will show less disk space usage.
  
0. What special hardware / data / etc. is needed (if any)?
+
In my view, this package is not aimed at filesystems requiring maximum read/write speed, but is more ideally suited to filesystems with low rate of change. Filesystems with high capacity requirements benefit the most.
1. How do I prepare my system to test this feature? What packages
+
need to be installed, config files edited, etc.?
+
2. What specific actions do I perform to check that the feature is
+
working like it's supposed to?
+
3. What are the expected results of those actions?
+
-->
+
  
 
== User Experience ==
 
== User Experience ==
De-duplication will be noticeable to target users by greatly reducing the disk space requirements for backups to disk and for virtual machine storage.  Greater reductions are seen where many images/backups share a common data set.
+
Deduplication will be noticeable to target users by greatly reducing the disk space requirements for backups to disk and for virtual machine storage.  Greater reductions are seen where many images/backups share a common data set.
  
 
== Dependencies ==
 
== Dependencies ==
<!-- What other packages (RPMs) depend on this package?  Are there changes outside the developers' control on which completion of this feature depends?  In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate?  Other upstream projects like the kernel (if this is not a kernel feature)? -->
+
* fuse
  
 
== Contingency Plan ==
 
== Contingency Plan ==
Line 64: Line 57:
  
 
== Documentation ==
 
== Documentation ==
<!-- Is there upstream documentation on this feature, or notes you have written yourself?  Link to that material here so other interested developers can get involved. -->
 
 
* http://www.lessfs.com/wordpress/
 
* http://www.lessfs.com/wordpress/
  
 
== Release Notes ==
 
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release. Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
+
* Filesystem for FUSE that allows for high performance inline data de-duplication using tokyocabinet for the database.
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
+
*
+
  
 
== Comments and Discussion ==
 
== Comments and Discussion ==
* See [[Talk:Features/YourFeatureName]] <!-- This adds a link to the "discussion" tab associated with your page.  This provides the ability to have ongoing comments or conversation without bogging down the main feature page -->
+
* See [[Talk:Features/LessFS]]
  
  
 
[[Category:FeaturePageIncomplete]]
 
[[Category:FeaturePageIncomplete]]
<!-- When your feature page is completed and ready for review -->
 
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->
 
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->
 
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->
 

Latest revision as of 11:43, 24 March 2011

Contents

[edit] LessFS

http://www.lessfs.com/

[edit] Summary

LessFS is a data deduplication project. The aim is to reduce disk usage where filesystem blocks are identical by only storing 1 block and using pointers to the original block for copies. This method of storage is becoming popular in Enterprise solutions for reducing disk backups and minimising virtual machine storage in particular.

[edit] Owner

  • Email: duncan AT innes DOT net

[edit] Current status

  • Targeted release: Fedora 15
  • Last updated: 2010-11-12
  • Percentage of completion: 0%

Looks like this is going to die as the RPM build doesn't look like it's going anywhere. Should I remove this as a target for Fedora 15? If so, how do I do that?

[edit] Detailed Description

Data deduplication is often used for backup purposes and for virtual machine image storage. lessfs can determine if data is redundant by calculating an unique (192 bit) tiger hash of each block of data that is written. When lessfs has determined that a block of data needs to be stored it first compresses the block with LZO or QUICKLZ compression. The combination of these two techniques results in a very high overall compression rate for many types of data. Multimedia files like mp3, avi or jpg files can not be compressed by lessfs when they are only stored once on the filesystem.

http://www.lessfs.com/wordpress/?page_id=50

[edit] Benefit to Fedora

This will bring an as yet unavailable enterprise tool to Fedora. Storage is becoming the biggest consumer of energy in the datacentre. De-duplication will help bring that power and cost requirement down. Inclusion of LessFS (even as a technology preview) will improve the coverage of Fedora and help to push forward an open source method of de-duplication.

[edit] Scope

LessFS adds functionallity that allows deduped file systems. The project is under current development and there are regular and frequent releases.

[edit] How To Test

No special hardware requirements.

A Package Review Request is currently sitting in Bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=530473) but appears to have stalled.

Once the package is installed, a filesystem can then be created.

Example:

Create a filesystem /data/orig as a normal partition. Create a filesystem /data/less as a de-duplicated fuse filesystem using LessFS.

Create a directory & file structure in /data/orig that uses multiple copies of a few large files. Renamed file copies in the same directory and same-name copies in different directories. Files should be multiple blocks in size for optimum testing. Data can be from /dev/random or similar to allow good LZ compression. Once the /data/orig filesystem is of a good size for testing (multiple Gb will be better, but not entirely necessary) copy all the data to /data/less.

An rsync should show that the /data/orig and /data/less filesystems are identical, but checking the /data/less directory will show less disk space usage.

In my view, this package is not aimed at filesystems requiring maximum read/write speed, but is more ideally suited to filesystems with low rate of change. Filesystems with high capacity requirements benefit the most.

[edit] User Experience

Deduplication will be noticeable to target users by greatly reducing the disk space requirements for backups to disk and for virtual machine storage. Greater reductions are seen where many images/backups share a common data set.

[edit] Dependencies

  • fuse

[edit] Contingency Plan

None necessary - this is a new feature and does not change any current part of Fedora

[edit] Documentation

[edit] Release Notes

  • Filesystem for FUSE that allows for high performance inline data de-duplication using tokyocabinet for the database.

[edit] Comments and Discussion