From Fedora Project Wiki

< SIGs‎ | bigdata‎ | packaging

No edit summary
 
(43 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This is an evaluation by the [[Big_data_SIG|Big Data SIG]] of the issues that need to be addressed in order to get Ambari into Fedora.  All of this work has been done on the 1.2.5 branch, which doesn't support hadoop 2.x.
This is an evaluation by the [[Big_data_SIG|Big Data SIG]] of the issues that need to be addressed in order to get Ambari into Fedora.  This work was started on the 1.2.5 branch, which didn't support Hadoop 2.x at the time. The current stable release (1.4.4) does.
 
{{admon/note|A strawman spec and source RPM can be found [http://pmackinn.fedorapeople.org/ambari/ here]. The Fedora Ambari server can be used to provision a Centos6 Ambari agent from Hortonworks with the appropriate care.}}


= Issues To Be Resolved =
= Issues To Be Resolved =
== Missing node.js Dependencies ==
== Missing node.js Dependencies ==
The ambari build uses brunch and other node.js parts to generate static web content. A significant portion of the dependency chain for the node.js parts are not in Fedora and would need to be packaged.  There are a few ways to handle the node.js dependency chain:
The Ambari build uses brunch and other node.js parts to generate static web content. A significant part of the development dependency chain for the node.js parts for brunch and ember are not in Fedora and [http://pmackinn.fedorapeople.org/ambari/ambari-normalized.txt 53 node.js projects would need to be packaged].  There are a few ways to handle the node.js dependency chain:
<ol>
<ol>
<li>Package all the node.js dependencies as individual rpms</li>
<li>Package all the node.js dependencies as individual rpms</li>
<li>Package all the node.js dependencies as a single rpm</li>
<li>Package all the node.js dependencies as a single rpm</li>
{{admon/note|This is at best a stop-gap solution that may not pass Fedora packaging review.  If this is a feasible path, a process will need to be established to break out these bundled dependencies over time.  The long time goal for brining all of these node.js bits into Fedora should be individual packages}}
{{admon/note|This is at best a stop-gap solution that may not pass Fedora packaging review.  If this is a feasible path, a process will need to be established to break out these bundled dependencies over time.  The long time goal for bringing all of these node.js bits into Fedora should be individual packages.}}
<li>Work with upstream to remove the need to generate the static web content</li>
<li>Work with upstream to remove the need to generate the static web content</li>
{{admon/note|For example, this could be done by upstream including the generated web content as part of a release}}
{{admon/note|Upstream is now including the generated web content starting with the 1.4.4 release. However, this content still bundles JS libraries such as [http://pmackinn.fedorapeople.org/ambari/ambar-vendor-js.txt Ember, d3, cubism, and more], which likely pushes back to options #1 or #2. Those vendor libs are absolutely required for the console to work correctly.}}
<li> Re-implement the node.js parts in source native to ambari</li>
<li> Re-implement the node.js parts in source native to Ambari</li>
<li>Find similar functionality that is already packaged in Fedora and provide support for its use in the ambari build</li>
<li>Find similar functionality that is already packaged in Fedora and provide support for its use in the Ambari build</li>
<li>Abandon packaging ambari for Fedora</li>
<li>Abandon packaging Ambari for Fedora</li>
</ol>
</ol>


== Missing Java Dependencies ==
== Missing Java Dependencies ==
There are only 2 java dependencies that aren't in Fedora that ambari needs, with a 3rd existing package needing modification.
There are only 2 Java dependencies that aren't in Fedora that Ambari needs, both for the test phase.
# org.xerial:sqlite-jdbc
# org.springframework:spring-mock
# org.springframework:spring-mock
# org.powermock:powermock-api-easymock
# org.powermock:powermock-api-easymock
{{admon/note|This is a module of powermock which is already packaged in Fedora, however the powermock package disables the easymock module. According to the powermock documentation it supports easymock 3.1, which is what is currently packaged in F19. It is likely the easymock module is disabled due to problems with multiple easymock versions in Fedora.}}
This is a module of powermock which is already packaged in Fedora, however the powermock package currently disables the easymock module. A [https://bugzilla.redhat.com/show_bug.cgi?id=1074674 BZ for this issue has been raised] since the latest version of easymock appears to be compatible.
 
== Dependency Version Issues ==


== Python Version ==
=== Puppet ===
The ambari build/runtime is hard coded to use python2.6. This needs to be cleaned up. There are 2 ([https://issues.apache.org/jira/browse/AMBARI-1790 AMBARI-1790], [https://issues.apache.org/jira/browse/AMBARI-1779 AMBARI-1779]) upstream jiras with patches to address these issues, but they have bit rotted some.
Ambari uses puppet manifests and directives for provisioning Hadoop components on hosts. At build, puppet version 2.7.9 is downloaded from Puppet Labs and added to the agent package. However, the version that is currently available in rawhide is 3.4.3. The puppet parser validates a config at application so this poses problems in two areas (so far):
* Agent modules have variable declarations like "$core-site=...". Version 3.4.3 forbids hyphens in variable names (alphanumeric and underscore only).
* At install, the agent retrieves puppet manifests for the selected stack (e.g., HDP 2.0.6). The structure of those cannot be processed by version 3.4.3 and it fails validation with "Import loop detected" errors.
The [http://koji.fedoraproject.org/koji/buildinfo?buildID=404234 last version of puppet 2.7 built for Fedora] still builds in rawhide at this point in time. However, it obviously would have to replace the incumbent version (3.4.x) due to files in common.


== Jetty Version ==
=== Facter and Ruby ===
The current version of jetty in Fedora is jetty 9, but ambari is asing for jetty 7.  Ambari will need to be updated to support jetty 9.
Rawhide currently has Facter 1.7.4 and Ruby 2.0.0 (deps for Puppet) while the Ambari build bundles older versions of both in the agent. Ruby 2.0.0 is '''not''' compatible with the older version of Puppet.  
{{admon/note|(Error: Could not autoload puppet/type/file: constant Puppet::Type::File not defined)}}


== Postgres Version ==
=== Python ===
The version of postgres in Fedora may require updates to the database initialization done by ambari. There is an upstream [https://issues.apache.org/jira/browse/AMBARI-1792 patch] to address this.
The Ambari build and runtime is hard coded to use python2.6: pom files, python scripts, javascript...everything. There are 2 ([https://issues.apache.org/jira/browse/AMBARI-1790 AMBARI-1790], [https://issues.apache.org/jira/browse/AMBARI-1779 AMBARI-1779]) upstream Jiras with patches to address these issues. More current versions of those Jira patches can be found [https://github.com/fedora-bigdata/ambari/commit/5acd732fb32c923880dc4bcb4f6b4ff6a4b455c1 here]. Obviously, this can be done in-spec as a sed manipulation.


== Multiple Versions of Easymock in Fedora ==
=== Jetty ===
In F19 there are currently 3 easymock packages, one for 1.x, one for 2.x, and one for 3.x.  ambari actually needs the newer 3.x line of easymock.  Unfortunately the jar resolver is non-deterministic when multiple jars have the same gid:aid, as is the case with the 3 easymock packages.  The easymock2 package seems to get top billing which causes the ambari and the powermock easymock module builds to fail.
The current version of Jetty in Fedora is Jetty 9, but Ambari is coded for Jetty 7. Fedora now has a Jetty 8 compatibility package in rawhide and necessary modifications are [https://github.com/fedora-bigdata/ambari/commit/56cff7ec5ace3ea5a140890ee3caee04412e67df here].


The fact that none of the easymock packages have been updated to the latest packaging guidelines complicates a resolution because none of them are being registered as compatibility packages and each of them is claiming to be the primary version of easymock. This apparently means Fedora's mechanism for resolving compatibility packages (which requires the build to query for the exact version in the compatibility package) won't work.
=== Postgres ===
The version of postgres in Fedora may require updates to the database initialization done by Ambari.  There is an upstream [https://issues.apache.org/jira/browse/AMBARI-1792 patch] to address this, which appears to have been fixed for 1.5.0 (yet to be released).


On F19 easymock2 gets top billing over eaymock3 when easymock is resolved.  easymock2 is supposed to be retired in F20, and the long term goal seems to be to get rid of the easymock3 package and have a single easymock package on the 3.x stream.  It is possible this could get done for F21.
= Open Issues =
== Oracle JDK6 ==
The runtime is actually designed and implemented to search for locally, and (if necessary) download the Oracle JDK 6 from the Hortonworks site (specifically jdk-6u31-linux-x64.bin). A command-line argument can be passed to the ambari-server setup task (-j <jvm location>) that will use that JVM path uniformly for both agents and the server. Thus, OpenJDK7 is technically supported though not the default.


It is possible that F20 will be a "good enough" solution to remove the easymock road block for ambari and powermock.
== Hadoop 2.x Support ==
The current Ambari release (1.4.4) supports the Hadoop 2.0.6 HDP release; Fedora has 2.2.0. Due to the nature of executing a downloaded HDP stack from Hortonworks, it is unknown at this time if there are specific compatibility issues with 2.2.0.


== Compilation Errors ==
== rpm-maven-plugin ==
There are various compilation issues due to newer/different version of java dependencies.  These should be resolved with upstream if possible.
The Ambari build uses the rpm-maven-plugin to generate rpms. Obviously, this maven plugin doesn't exist in Fedora and likely never will since it is antithetical to Fedora packaging from specifications. A Fedora spec build can ignore the presence of this plugin and just use artifacts as they sit in BUILD, but it does represent a significant disconnect between upstream and Fedora.


= Open Issues =
= Fedora Packaging Repository =
== Hadoop 2.x Support ==
Ambari has the ability to install packages on a client machine and it pulls those packages from Hortonworks repos that are hard coded in the server. It determines which repositories to use based upon the OS, and Fedora is not recognized as a valid/supported OS. Ambari will need to be modified to not only accept Fedora as a valid OS, but also to pull the packages from Fedora repos and not from HortonworksThis [https://issues.apache.org/jira/browse/AMBARI-3174 Fedora specific issue] has been logged with upstream, as has [https://issues.apache.org/jira/browse/AMBARI-3524 a more general architecture request for CDH and Apache repos].
The current ambari release (1.2) does not support the hadoop 2.x series, which is the version of hadoop packaged in Fedora.  A timeline for hadoop 2.x support from upstream would be helpful. Related jira [https://issues.apache.org/jira/browse/AMBARI-1543 here].
 
{{admon/note|Perhaps more than any other single issue listed previously, this is the one of most architectural importance. Ambari as it is constructed today is specifically developed to work with Hortonworks HDP stacks. It does so to the point of enforcing strict OS agreement between agents and the cluster server at startup, registration, stack installation, etc. and bundling dependencies such as puppet and ruby with the agent. This needs to be addressed with upstream in terms of planning for a more "open" pluggable approach to stacks including agents that can be installed using locally available dependencies.}}

Latest revision as of 16:51, 8 April 2014

This is an evaluation by the Big Data SIG of the issues that need to be addressed in order to get Ambari into Fedora. This work was started on the 1.2.5 branch, which didn't support Hadoop 2.x at the time. The current stable release (1.4.4) does.

Note.png
A strawman spec and source RPM can be found here. The Fedora Ambari server can be used to provision a Centos6 Ambari agent from Hortonworks with the appropriate care.

Issues To Be Resolved

Missing node.js Dependencies

The Ambari build uses brunch and other node.js parts to generate static web content. A significant part of the development dependency chain for the node.js parts for brunch and ember are not in Fedora and 53 node.js projects would need to be packaged. There are a few ways to handle the node.js dependency chain:

  1. Package all the node.js dependencies as individual rpms
  2. Package all the node.js dependencies as a single rpm
  3. Note.png
    This is at best a stop-gap solution that may not pass Fedora packaging review. If this is a feasible path, a process will need to be established to break out these bundled dependencies over time. The long time goal for bringing all of these node.js bits into Fedora should be individual packages.
  4. Work with upstream to remove the need to generate the static web content
  5. Note.png
    Upstream is now including the generated web content starting with the 1.4.4 release. However, this content still bundles JS libraries such as Ember, d3, cubism, and more, which likely pushes back to options #1 or #2. Those vendor libs are absolutely required for the console to work correctly.
  6. Re-implement the node.js parts in source native to Ambari
  7. Find similar functionality that is already packaged in Fedora and provide support for its use in the Ambari build
  8. Abandon packaging Ambari for Fedora

Missing Java Dependencies

There are only 2 Java dependencies that aren't in Fedora that Ambari needs, both for the test phase.

  1. org.springframework:spring-mock
  2. org.powermock:powermock-api-easymock

This is a module of powermock which is already packaged in Fedora, however the powermock package currently disables the easymock module. A BZ for this issue has been raised since the latest version of easymock appears to be compatible.

Dependency Version Issues

Puppet

Ambari uses puppet manifests and directives for provisioning Hadoop components on hosts. At build, puppet version 2.7.9 is downloaded from Puppet Labs and added to the agent package. However, the version that is currently available in rawhide is 3.4.3. The puppet parser validates a config at application so this poses problems in two areas (so far):

  • Agent modules have variable declarations like "$core-site=...". Version 3.4.3 forbids hyphens in variable names (alphanumeric and underscore only).
  • At install, the agent retrieves puppet manifests for the selected stack (e.g., HDP 2.0.6). The structure of those cannot be processed by version 3.4.3 and it fails validation with "Import loop detected" errors.

The last version of puppet 2.7 built for Fedora still builds in rawhide at this point in time. However, it obviously would have to replace the incumbent version (3.4.x) due to files in common.

Facter and Ruby

Rawhide currently has Facter 1.7.4 and Ruby 2.0.0 (deps for Puppet) while the Ambari build bundles older versions of both in the agent. Ruby 2.0.0 is not compatible with the older version of Puppet.

Note.png
(Error: Could not autoload puppet/type/file: constant Puppet::Type::File not defined)

Python

The Ambari build and runtime is hard coded to use python2.6: pom files, python scripts, javascript...everything. There are 2 (AMBARI-1790, AMBARI-1779) upstream Jiras with patches to address these issues. More current versions of those Jira patches can be found here. Obviously, this can be done in-spec as a sed manipulation.

Jetty

The current version of Jetty in Fedora is Jetty 9, but Ambari is coded for Jetty 7. Fedora now has a Jetty 8 compatibility package in rawhide and necessary modifications are here.

Postgres

The version of postgres in Fedora may require updates to the database initialization done by Ambari. There is an upstream patch to address this, which appears to have been fixed for 1.5.0 (yet to be released).

Open Issues

Oracle JDK6

The runtime is actually designed and implemented to search for locally, and (if necessary) download the Oracle JDK 6 from the Hortonworks site (specifically jdk-6u31-linux-x64.bin). A command-line argument can be passed to the ambari-server setup task (-j <jvm location>) that will use that JVM path uniformly for both agents and the server. Thus, OpenJDK7 is technically supported though not the default.

Hadoop 2.x Support

The current Ambari release (1.4.4) supports the Hadoop 2.0.6 HDP release; Fedora has 2.2.0. Due to the nature of executing a downloaded HDP stack from Hortonworks, it is unknown at this time if there are specific compatibility issues with 2.2.0.

rpm-maven-plugin

The Ambari build uses the rpm-maven-plugin to generate rpms. Obviously, this maven plugin doesn't exist in Fedora and likely never will since it is antithetical to Fedora packaging from specifications. A Fedora spec build can ignore the presence of this plugin and just use artifacts as they sit in BUILD, but it does represent a significant disconnect between upstream and Fedora.

Fedora Packaging Repository

Ambari has the ability to install packages on a client machine and it pulls those packages from Hortonworks repos that are hard coded in the server. It determines which repositories to use based upon the OS, and Fedora is not recognized as a valid/supported OS. Ambari will need to be modified to not only accept Fedora as a valid OS, but also to pull the packages from Fedora repos and not from Hortonworks. This Fedora specific issue has been logged with upstream, as has a more general architecture request for CDH and Apache repos.

Note.png
Perhaps more than any other single issue listed previously, this is the one of most architectural importance. Ambari as it is constructed today is specifically developed to work with Hortonworks HDP stacks. It does so to the point of enforcing strict OS agreement between agents and the cluster server at startup, registration, stack installation, etc. and bundling dependencies such as puppet and ruby with the agent. This needs to be addressed with upstream in terms of planning for a more "open" pluggable approach to stacks including agents that can be installed using locally available dependencies.