From Fedora Project Wiki
No edit summary
m (Formatted the page, updated proposal a little.)
 
Line 1: Line 1:
[[Category:Summer coding 2015]]
[[Category:Summer coding 2015]]
'''Project Description'''
 
==Project Description==


BTRFS is a new, actively developed file system with various advanced features. I wish to implement content-based-storage mode for btrfs file system. In fact, this project is also mentioned in the TODO-list of the BTRFS ideas page.
BTRFS is a new, actively developed file system with various advanced features. I wish to implement content-based-storage mode for btrfs file system. In fact, this project is also mentioned in the TODO-list of the BTRFS ideas page.
Line 8: Line 9:
My research at CMU aims at building content-caches for routers https://github.com/harshadjs/xia-content-cache. It demands a file system that allows such a storage mode. I think it would be ideal for the interests of BTRFS community and the research at CMU if I could work on this project in the summer.
My research at CMU aims at building content-caches for routers https://github.com/harshadjs/xia-content-cache. It demands a file system that allows such a storage mode. I think it would be ideal for the interests of BTRFS community and the research at CMU if I could work on this project in the summer.


'''Biography and Technical Background:'''
==Biography and Technical Background==


I am a Computer Science Graduate student at Carnegie Mellon University with research interest primarily in Computer Networks. I use Linux daily and am passionate about Open source software development.
I am a Computer Science Graduate student at Carnegie Mellon University with research interest primarily in Computer Networks. I use Linux daily and am passionate about Open source software development.
Line 22: Line 23:
You can expect a very high level of fluency with C and Kernel programming from me. This is something that I love to do.
You can expect a very high level of fluency with C and Kernel programming from me. This is something that I love to do.


'''Goals'''
==Goals==


75% Goal
* '''75% Goal'''
Create a new "Content" tree. This tree should store hashes of all the extents in the file system.
** Create a new "Content" tree. This tree should store hashes of all the extents in the file system.
Provide option to enable / disable content-storage-mode at mount-time or mkfs-time (TBD).
** Create a "File Hash" tree. This tree should will store the mapping from hash of a file to its inode.
Implement all the reference counting mechanisms for extents in this content-tree.
** Provide option to enable / disable content-storage-mode at mount-time or mkfs-time (TBD).
100% Goal
** Implement all the reference counting mechanisms for extents in this content-tree.
Intercept writes and check if the data that is being written is already in the content tree.
* '''100% Goal'''
Enhance debugging methods available in btrfs (I am not sure which ones are available) to support debugging content-trees.
** Intercept writes and check if the data that is being written is already in the content tree.
125% Goal
** Intercept reads
Provide various mount-time configuration options, such as:
*** Given the hash of file, lookup inode for a file from "File Hash" tree.
Remove or Don't remove extents if reference count becomes 0. (Especially useful for our routing application.)
** Enhance debugging methods available in btrfs (I am not sure which ones are available) to support debugging content-trees.
Verify or Trust the checksum of extents.
* '''125% Goal'''
** Provide various mount-time configuration options, such as:
** Remove or Don't remove extents if reference count becomes 0. (Especially useful for our routing application.)
** Verify or Trust the checksum of extents.


'''Milestones of the Project:'''
==Milestones of the Project==


M1: Understand the design and code of Btrfs. Especially focus on how the current extent-trees, subvolume trees, snapshot trees are setup initially. Study on-disk data structures, most likely, we are going to need to add some bits in the super-block: For example "content-storage-mode-on/off".
* M1: Understand the design and code of Btrfs. Especially focus on how the current extent-trees, subvolume trees, snapshot trees are setup initially. Study on-disk data structures, most likely, we are going to need to add some bits in the super-block: For example "content-storage-mode-on/off".
M2: Understand and identify the code areas wherein the hooks are to be applied. Need to find hooks for:
* M2: Understand and identify the code areas wherein the hooks are to be applied. Need to find hooks for:
Intercepting writes
** Intercepting writes
Reading extents
** Reading extents
Debugging interfaces
** Debugging interfaces
M3: Write a detailed design draft which will talk about all the overall goal, required on-disk-changes, functions to be modified. Share the draft with BTRFS community and get their views.
* M3: Write a detailed design draft which will talk about all the overall goal, required on-disk-changes, functions to be modified. Share the draft with BTRFS community and get their views.
M4: Implementation and testing of the code: 75%
* M4: Implementation and testing of the code: 75%
M5: Implementation and testing of the code: 100%
* M5: Implementation and testing of the code: 100%
M6: Implementation and testing of the code: 125% (If time permits)
* M6: Implementation and testing of the code: 125% (If time permits)
M7: Write documentation of the final product
* M7: Write documentation of the final product


'''Plan of action'''
==Plan of action==


By the end of the week 1: M1, M2
* By the end of the week 1: M1, M2
By the end of the week 2: M3
* By the end of the week 2: M3
(Midterm) By the end of the week 5: M4
* (Midterm) By the end of the week 5: M4
By the end of the week 7: M5
* By the end of the week 7: M5
By the end of the week 9: M6
* By the end of the week 9: M6
(End) By the end of the week 10: M7
* (End) By the end of the week 10: M7


'''Why choose me?'''
==Why choose me?==


Past successful GSoC student (2011).
* Past successful GSoC student (2011).
Past experience of working with the open source community.
* Past experience of working with the open source community.
Strong understanding of file systems, C programming language, the UNIX philosophy, Linux.
* Strong understanding of file systems, C programming language, the UNIX philosophy, Linux.
Passionate about contributing to Linux.
* Passionate about contributing to Linux.


'''Time commitment'''
==Time commitment==


Apart from this project, I have research commitment at CMU. So, I expect to spend at least 30 hrs / week on this project. My final exams end on 13th May 2015 and I hope to start right after that. I will be visiting my hometown (Pune, India) towards the May-End / June first week. That is the only time when I could be a little slacked. Rest of the summer, I will be on top of the project.
Apart from this project, I have research commitment at CMU. So, I expect to spend at least 30 hrs / week on this project. My final exams end on 13th May 2015 and I hope to start right after that. I will be visiting my hometown (Pune, India) towards the May-End / June first week. That is the only time when I could be a little slacked. Rest of the summer, I will be on top of the project.

Latest revision as of 01:42, 11 April 2015


Project Description

BTRFS is a new, actively developed file system with various advanced features. I wish to implement content-based-storage mode for btrfs file system. In fact, this project is also mentioned in the TODO-list of the BTRFS ideas page.

In some applications, such as Internet content-caches, most often than not, the data is read-only. For such cases, the lookup time is the most important metric. It is very inefficient for such applications to store data in a conventional file-path based manner. In content-based-storage mode, the data is stored on the disk only on the basis of "hash" of its content. The lookup is also hash based - thus extremely quick. Another advantage of hash-based storage is that data duplication is not possible.

My research at CMU aims at building content-caches for routers https://github.com/harshadjs/xia-content-cache. It demands a file system that allows such a storage mode. I think it would be ideal for the interests of BTRFS community and the research at CMU if I could work on this project in the summer.

Biography and Technical Background

I am a Computer Science Graduate student at Carnegie Mellon University with research interest primarily in Computer Networks. I use Linux daily and am passionate about Open source software development.

In my undergraduate years, I worked on a open-source Linux kernel project "Snapshots for Ext4 filesystem". Patches were sent to the Ext4 community for review. I received a mention for the contribution to the project at http://lwn.net/Articles/442078/ .

We were interested in extend Ext4 snapshots project, and so I participated in Google Summer of Code 2011. My proposal for "Snapshot revert feature for Ext4" was accepted by The Fedora Project and I successfully completed the project back then. I look forward to continue my interest and be associated with the Fedora project by applying the proposal "Content-storage mode for BTRFS" for the year 2015.

I have worked for a Wi-Fi technology startup "AirTight Networks" for 3 years (2011-2014), where I was working in the Linux device drivers team.

I then joined Carnegie Mellon University in May 2014, where my main area of studies is Computer Networks.

You can expect a very high level of fluency with C and Kernel programming from me. This is something that I love to do.

Goals

  • 75% Goal
    • Create a new "Content" tree. This tree should store hashes of all the extents in the file system.
    • Create a "File Hash" tree. This tree should will store the mapping from hash of a file to its inode.
    • Provide option to enable / disable content-storage-mode at mount-time or mkfs-time (TBD).
    • Implement all the reference counting mechanisms for extents in this content-tree.
  • 100% Goal
    • Intercept writes and check if the data that is being written is already in the content tree.
    • Intercept reads
      • Given the hash of file, lookup inode for a file from "File Hash" tree.
    • Enhance debugging methods available in btrfs (I am not sure which ones are available) to support debugging content-trees.
  • 125% Goal
    • Provide various mount-time configuration options, such as:
    • Remove or Don't remove extents if reference count becomes 0. (Especially useful for our routing application.)
    • Verify or Trust the checksum of extents.

Milestones of the Project

  • M1: Understand the design and code of Btrfs. Especially focus on how the current extent-trees, subvolume trees, snapshot trees are setup initially. Study on-disk data structures, most likely, we are going to need to add some bits in the super-block: For example "content-storage-mode-on/off".
  • M2: Understand and identify the code areas wherein the hooks are to be applied. Need to find hooks for:
    • Intercepting writes
    • Reading extents
    • Debugging interfaces
  • M3: Write a detailed design draft which will talk about all the overall goal, required on-disk-changes, functions to be modified. Share the draft with BTRFS community and get their views.
  • M4: Implementation and testing of the code: 75%
  • M5: Implementation and testing of the code: 100%
  • M6: Implementation and testing of the code: 125% (If time permits)
  • M7: Write documentation of the final product

Plan of action

  • By the end of the week 1: M1, M2
  • By the end of the week 2: M3
  • (Midterm) By the end of the week 5: M4
  • By the end of the week 7: M5
  • By the end of the week 9: M6
  • (End) By the end of the week 10: M7

Why choose me?

  • Past successful GSoC student (2011).
  • Past experience of working with the open source community.
  • Strong understanding of file systems, C programming language, the UNIX philosophy, Linux.
  • Passionate about contributing to Linux.

Time commitment

Apart from this project, I have research commitment at CMU. So, I expect to spend at least 30 hrs / week on this project. My final exams end on 13th May 2015 and I hope to start right after that. I will be visiting my hometown (Pune, India) towards the May-End / June first week. That is the only time when I could be a little slacked. Rest of the summer, I will be on top of the project.