From Fedora Project Wiki

Javi Roman: Twitter Linkedin Photography

Fedora Big Data Package Ecosystem

Fedora Hosted Packages
Package Packaged Version Upstream Version Sources
Apache Hadoop 2.4.1 2.7.0 http://pkgs.fedoraproject.org/cgit/hadoop.git/
Apache HBase 0.98.3 1.0.1 http://pkgs.fedoraproject.org/cgit/hbase.git/
Apache Hive 0.12.2 1.1.0 http://pkgs.fedoraproject.org/cgit/hive.git/
Apache Pig 0.13.10 0.14.0 http://pkgs.fedoraproject.org/cgit/pig.git/
Apache Zookeeper 3.4.6 3.4.6 http://pkgs.fedoraproject.org/cgit/zookeeper.git/
Apache Oozie 4.0.1 4.1.0 http://pkgs.fedoraproject.org/cgit/oozie.git/
Apache Ambari 1.5.1 2.0.0 http://pkgs.fedoraproject.org/cgit/ambari.git/
Apache Accumulo 1.6.1 1.6.2 http://pkgs.fedoraproject.org/cgit/accumulo.git/
Apache Mesos 0.22.1 0.22.1 http://pkgs.fedoraproject.org/cgit/mesos.git/
Apache Solr 4.10.4 5.1.0 http://pkgs.fedoraproject.org/cgit/solr.git/
Apache Spark 0.9.1 1.3.1 http://pkgs.fedoraproject.org/cgit/spark.git/
AMPLab Tachyon 0.99 0.6.4 http://pkgs.fedoraproject.org/cgit/tachyon.git
On going packages
Package Packaged Version Upstream Version Status Sources
Apache Flume 1.5.0 1.5.0 Partially supported https://github.com/fedora-bigdata-rpms/flume-rpm
Cloudera Kite SDK 1.0.0 1.0.0 https://gil.fedorapeople.org/kite.spec
Apache Crunch 0.11.0 0.11.0 https://github.com/fedora-bigdata-rpms/crunch-rpm
Apache Tez 0.5.3 0.6.0 https://github.com/fedora-bigdata-rpms/tez-rpm
Apache Kafka 0.8.0 0.8.2.1 https://github.com/fedora-bigdata-rpms/kafka-rpm
Apache Tajo 0.10.0 0.10.0 https://gil.fedorapeople.org/tajo.spec
Apache Jena
Cascading 2.6.3 2.6.3 https://gil.fedorapeople.org/cascading.spec

Apache Flume package status

Package status

The package builds with this assumptions (we are working on this issues)

  • The code is not ready for Thrift v0.9.1 available in Fedora 21, however Flume code can builds using legacy Thrift built-in code available in the upstream Flume TGZ.
  • Disable ElasticSearch Sink
  • Disable Morphline Solr Sink
  • Disable Twitter Source
  • Disable Kite Dataset Sink

Testing the package

git clone https://github.com/fedora-bigdata-rpms/flume-rpm.git
cd flume-rpm
spectool -g flume.spec
rpmbuild -bs --nodeps --define "_sourcedir ." --define "_srcrpmdir ." flume.spec
sudo mock flume-1.5.2-1.fc21.src.rpm

Dependency packages

  • In order to build Flume with full features those are the dependency packages and their status:
Package Bugzilla Status
irclib RHBZ #976049 Package is available in Rawhide and in Fedora 21 as an update
mapdb RHBZ #1178861 Package is available in Rawhide and in Fedora 21 as an update
asynchbase No BZ ticket No added for revision in Bugzilla
suasync No BZ ticket aynchbase dependency. No added for revision in Bugzilla
kite RHBZ #1179355 Patched in order to support Fedora Guava version (partial support).
parquet RHBZ #1073017 kite package dependency. Package is available in Rawhide and was submitted to Fedora 22 and 21 as an update
parquet-format RHBZ #1073014 parquet package dependency. Package is available in Rawhide and was submitted to Fedora 22 and 21 as an update
maxmind-db-java RHBZ #1179309 kite package dependency. Package is available in Rawhide and in Fedora 21 as an update
ua-parser-java RHBZ #1179342 kite package dependency. Package is available in Rawhide and in Fedora 21 as an update
elasticsearch RHBZ #902086 RHBZ #1181564 Package is available in Rawhide and in Fedora 22 as an update

Apache Storm package status

sources

Apache Kafka package status

sources

Apache Tez package status

sources

Apache Crunch package status

sources