From Fedora Project Wiki

< SIGs‎ | bigdata‎ | packaging

mNo edit summary
mNo edit summary
Line 19: Line 19:
  | {{bz|1009170}}
  | {{bz|1009170}}
  | [[User:ricardo|ricardo]]
  | [[User:ricardo|ricardo]]
  | Although avro 1.6.2 is packaged, it does not include the ipc and mapred jars. IPC appears to only apply to 0.20 shim. MapRed is used by an Avro output format in QL. To complicate matters, 1.6.2 only supports the old mapred API.
  | Although avro 1.6.2 is packaged, it does not include the ipc and mapred jars. IPC appears to only apply to 0.20 shim. MapRed is used by an Avro reader/input/output feature in QL and is based on the legacy mapred API (i.e.,org.apache.hadoop.mapred).
  |-
  |-
  | commons-httpclient
  | commons-httpclient

Revision as of 13:51, 26 September 2013

From the project site: "Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL."

The Fedora Big Data SIG is investigating the requirements to adapt the latest version of Hive as a package in Fedora, now that Hadoop 2.x has been packaged. Although Hive obviously has a significant dependency on Hadoop, the Java project is not Maven-based and instead is built using Ant and Ivy. The Packaging:Java xmvn tooling support in Fedora does not directly apply to the Hive build. In many ways this can be viewed as a simplification instead of a challenge since one can configure a local file-system Ivy resolver relatively easily.

Using static build-derived analysis (Ant doesn't really provide something like the Maven dependency plugin), there are a group of dependencies that are currently missing from Fedora which block the build of Hive using Fedora-only installed versions. There are also many dependencies available which are not necessarily version-compatible. However, like the Changes/Hadoop outline, those can hopefully be mitigated in the Hive source where possible.

Version 0.11 is the latest release and built from source (using the Fedora Hadoop target of 2.0.5a) using:

ant very-clean package -Dhadoop.version=2.0.5-alpha -Dhadoop-0.23.version=2.0.5-alpha -Dhadoop.mr.rev=23 -DenhanceModel.notRequired=true

The full Hive dependency list is captured here but the following table outlines the missing dependencies. The ones in bold are deemed hard dependencies and must be packaged.

Missing/Questionable Dependencies
Project State Review BZ Packager Notes
avro-ipc, avro-mapred Blocked RHBZ #1009170 ricardo Although avro 1.6.2 is packaged, it does not include the ipc and mapred jars. IPC appears to only apply to 0.20 shim. MapRed is used by an Avro reader/input/output feature in QL and is based on the legacy mapred API (i.e.,org.apache.hadoop.mapred).
commons-httpclient Available Jakarta?
datanucleus-core Review pmackinn,gil Forms backbone of metastore layer for different data sinks. Upstream project at http://www.datanucleus.org/
datanucleus-api-jdo Review pmackinn,gil JDO implementation for datanucleus
datanucleus-rdbms Review pmackinn,gil RDBMS plugin adapter for datanucleus
ftplet-api Available Appears to only apply to 0.20 shim
ftpserver-core, ftpserver-deprecated Available Appears to only apply to 0.20 shim
geronimo-j2ee-management_1.1_spec Available Hopefully can substitute jboss-j2eemgmt-1.1-api package
hbase Active rrati hbase-handler can be compiled out but seems like a significant omission
high-scale-lib Review RHBZ #865893 gil
httpclient, httpcore Available Same as above...Jakarta?
javolution Review RHBZ #1009153 pmackinn Used by the QL classes: a hard dependency
jdo-api Review RHBZ #1011696 pmackinn,gil Dependency for datanucleus-api-jdo. CANNOT substitute existing jdo2-api.
libthrift, libfb303 Review RHBZ #982285, RHBZ #1000563 willb Will Benton has some RPM artifacts at http://freevariable.com/thrift/
metrics-core Review RHBZ #861502 gil
pig Available Test and source imports of Pig classes, however they appear to be in the adapter space so may be able to defer.
stax-api Available Need to sort out which of available stax-api packages could fit (bea, stax2)
tempus-fugit Review RHBZ #1009654 gil Concurrency library. May only be a test dep. Upstream at http://tempusfugitlibrary.org/