From Fedora Project Wiki

< SIGs‎ | bigdata‎ | packaging

Line 9: Line 9:
== Current Status ==
== Current Status ==


With some light Fedora-specific patching, we are able to build and run the Scala 2.10 branch of Spark locally in Fedora against our <code>sbt</code> and Scala.  Once <code>sbt</code> is available, other things should follow pretty quickly.  Detailed dependency information is below.
Apache Spark [https://bugzilla.redhat.com/show_bug.cgi?id=1071495 is under review] in Fedora for rawhide (f21).  The Fedora Spark package includes Spark Core, MLLib, GraphX, Bagel, and Spark Streaming, but currently (as of <code>spark-0.9.0-0.2</code>) has some differences from upstream Spark:
 
* it builds against Akka 2.3.0-RC2 instead of Akka 2.2.3 (this entails some trivial API changes)
* it doesn't support Kryo serialization yet (because [https://github.com/twitter/chill Chill] isn't yet available in Fedora), which limits some Spark Streaming functionality
* it doesn't support functionality dependent on [https://github.com/addthis/stream-lib Clearspring <code>stream-lib</code>], since <code>stream-lib</code> depends on bundled code and can't yet be packaged for Fedora; most notably, the <code>countApproxDistinctByKey</code> methods on RDDs aren't supported.
* it doesn't support Mesos, since the Fedora Mesos package doesn't include Java support
 
We're working on addressing these limitations.


Most of the Scala packages build with tests disabled (due to unavailable test dependencies) or with varying degrees of modification to the upstream build process (due to varying dependency versions or unavailable <code>sbt</code> plugins).  If you're looking for an easy way to get involved, packaging some of these missing dependencies would be a great place to start.  See the list under [[SIGs/bigdata/packaging/Scala#Other_useful_Scala_and_sbt_dependencies|Scala packaging]] or find [[User:Willb|willb]] on IRC for more information
Most of the Scala packages build with tests disabled (due to unavailable test dependencies) or with varying degrees of modification to the upstream build process (due to varying dependency versions or unavailable <code>sbt</code> plugins).  If you're looking for an easy way to get involved, packaging some of these missing dependencies would be a great place to start.  See the list under [[SIGs/bigdata/packaging/Scala#Other_useful_Scala_and_sbt_dependencies|Scala packaging]] or find [[User:Willb|willb]] on IRC for more information

Revision as of 18:45, 1 March 2014

Spark packaging

Background

See Scala packaging for details on Fedora support for the Scala toolchain; briefly, we have version 2.10.3 of Scala in F19, F20, and Rawhide and sbt 0.13.1 is under review in Fedora as well.

Spark 0.9.0 works with Scala 2.10.3, and the upstream source repository currently has support for SBT 0.13.x.

Current Status

Apache Spark is under review in Fedora for rawhide (f21). The Fedora Spark package includes Spark Core, MLLib, GraphX, Bagel, and Spark Streaming, but currently (as of spark-0.9.0-0.2) has some differences from upstream Spark:

  • it builds against Akka 2.3.0-RC2 instead of Akka 2.2.3 (this entails some trivial API changes)
  • it doesn't support Kryo serialization yet (because Chill isn't yet available in Fedora), which limits some Spark Streaming functionality
  • it doesn't support functionality dependent on Clearspring stream-lib, since stream-lib depends on bundled code and can't yet be packaged for Fedora; most notably, the countApproxDistinctByKey methods on RDDs aren't supported.
  • it doesn't support Mesos, since the Fedora Mesos package doesn't include Java support

We're working on addressing these limitations.

Most of the Scala packages build with tests disabled (due to unavailable test dependencies) or with varying degrees of modification to the upstream build process (due to varying dependency versions or unavailable sbt plugins). If you're looking for an easy way to get involved, packaging some of these missing dependencies would be a great place to start. See the list under Scala packaging or find willb on IRC for more information

Dependencies

Spark requires Scala and SBT in order to build. (Note that there is a Maven build option as well, but it requires artifacts that need to be built with SBT.)

The easiest and most up-to-date place to see the dependency list is in the Spark repository itself, but here we will call out some notable dependencies that aren't already in Fedora:

Dependencies
Project State Review BZ Packager Notes
sbt under review sbt-package BZ willb
lift-json Not necessary any more with this patch (either carried or integrated into upstream) willb
json4s available to review json4s-package BZ willb
akka gil and willb are looking at this gil
Squeryl awaiting a reviewer 1057770 willb dependency of lift; no longer strictly necessary but still nice to have for the Scala ecosystem
scalaz awaiting a reviewer 1055809 willb dependency of lift-json; no longer strictly necessary but still nice to have for the Scala ecosystem
metrics available in F20 861502 gil Coda Hale's metrics (Java/Maven build).