Apache Hadoop 2.x

Summary

Provide native Apache Hadoop packages.

Owner

Name: Matthew Farrellee
Email: matt@fedoraproject.org
Release notes owner:

Current status

Targeted release: Fedora 20
Last updated: 29 Oct 2013
Tracker bug: RHBZ #998521

Detailed Description

Apache Hadoop is a widely used, increasingly complete big data platform, with a strong open source community and growing ecosystem. The goal is to package and integrate the core of the Hadoop ecosystem for Fedora, allowing for immediate use and creating a base for the rest of the ecosystem.

Benefit to Fedora

The Apache Hadoop software will be packaged and integrated with Fedora. The core of the Hadoop ecosystem will be available with Fedora and provide a base for additional packages.

Scope

Proposal owners:
- Note: target is Apache Hadoop 2.2.0
- Package all dependencies needed for Apache Hadoop 2.x
- Package the Apache Hadoop 2.x software

Other developers: N/A (not a System Wide Change)

Release engineering: N/A (not a System Wide Change)

Policies and guidelines: N/A (not a System Wide Change)

Upgrade/compatibility impact

N/A (not a System Wide Change)

How To Test

Install the hadoop rpms with:
- yum install hadoop-common hadoop-common-native hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn
Initialize the HDFS directories:
- hdfs-create-dirs
Start the cluster by issuing:
- systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager
Create a directory for the user running the tests:
1. runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/<name>"
2. runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown <name> /user/<name>"
The user from the previous step can run jobs like the following mapreduce examples:
- hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar pi 10 1000000
- hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar randomwriter out
- These three have an order they need to be run in:
  1. hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar teragen 100 gendata
  2. hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar terasort gendata 100
  3. hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar teravalidate gendata reportdata

User Experience

N/A (not a System Wide Change)

Dependencies

N/A (not a System Wide Change)

Contingency Plan

Contingency mechanism: N/A (not a System Wide Change)
Contingency deadline: N/A (not a System Wide Change)
Blocks release? N/A (not a System Wide Change), Yes/No

Documentation

http://wiki.apache.org/hadoop

Release Notes

TODO

Effort details

People involved

Name	IRC	Focus	Additional
Matthew Farrellee	mattf	keeping track, integration testing	UTC-5
Peter MacKinnon	pmackinn	packaging, testing	UTC-5
Rob Rati	rsquared	packaging	UTC-5
Timothy St. Clair	tstclair	config, upstream tracking	UTC-6
Sam Kottler	skottler	packaging	UTC-5
Gil Cattaneo	gil	packaging	UTC+1
Christopher Meng	cicku	packaging, testing	UTC+8

Detailed status

Last updated: 29 Oct 2013
Percentage of completion
- Dependencies available in Fedora (missing since project initiation): 100%
- Adaptation of Hadoop 2.2.0 source via patches: 100%
- Hadoop spec completion: 100%
Test suite (Updated to 2.2.0; all skips are from upstream)

Module	Tests	Failures	Errors	Skipped
hadoop-auth	48	0	0	0
hadoop-common	2015	4	0	64
hadoop-nfs	46	0	0	0
hadoop-hdfs	2040	8	1	7
hadoop-hdfs-httpfs	286	0	0	0
hadoop-hdfs-bkjournal	32	0	0	0
hadoop-hdfs-nfs	9	1	0	0
hadoop-yarn-common	109	0	0	0
hadoop-yarn-client	17	0	0	0
hadoop-yarn-server-common	3	0	0	0
hadoop-yarn-server-nodemanager	153	0	1	0
hadoop-yarn-server-web-proxy	9	0	0	0
hadoop-yarn-server-resourcemanager	277	0	0	0
hadoop-yarn-server-tests	7	0	0	0
hadoop-yarn-applications-distributedshell	2	0	0	0
hadoop-yarn-applications-unmanaged-am-launcher	1	0	0	0
hadoop-mapreduce-examples	11	0	0	1
hadoop-mapreduce-client-core	56	0	0	0
hadoop-mapreduce-client-common	43	0	0	0
hadoop-mapreduce-client-shuffle	4	0	0	0
hadoop-mapreduce-client-app	210	2	0	0
hadoop-mapreduce-client-jobclient	443	0	3	12
hadoop-mapreduce-client-hs	128	2	0	0
hadoop-mapreduce-client-hs-plugins	1	0	0	0
hadoop-streaming	55	0	6	0
hadoop-distcp	112	0	0	0
hadoop-archives	2	0	0	0
hadoop-rumen	3	0	0	1
hadoop-gridmix	44	0	0	0
hadoop-datajoin	1	0	0	0
hadoop-extras	20	0	0	1

Approach

We are taking an iterative, depth-first approach to packaging. We do not have all the dependencies mapped out ahead of time. Dependencies are being tabulated into two groups:

missing - the dependency being requested from a hadoop-common pom has not yet been packaged, reviewed or generated into fedora repos
broken - the dependency requested is out of date with current fedora versions, and patches must be developed for inclusion in a hadoop rpm build that address any build, API or source code deltas

Note that a dependency may show up in both of these tables.

Anyone who wants to help should find an available dependency below, edit the table changing the state to Active and packager to yourself.

If you are lucky enough to pick a dependency that itself has unpackaged dependencies, identify the sub-dependencies and add them to the bottom of the Dependencies table below, change your current dependency to Blocked and repeat.

If your dependency is already packaged but the version is incompatible, contact the package owner and resolve the incompatibility in a mutually satisfactory way. For instance:

If the version available in Fedora is older, explore updating the package. If that is not possible, explore creating a package that includes a version in its name, e.g. pkgnameXY. Ultimately, the most recent version in Fedora should have the name pkgname while older versions have pkgnameXY. It may take a full Fedora release to rationalize package names. Make a note in the Dependencies table.

If the version you need is older than the packaged version, consider creating a patch to use the newer version. If a patch is not viable, proceed by packaging the dependency with a version in its name, e.g. pkgnameXY. Make a note in the Dependencies table.

There is tattletale dependency graph data for both the baseline branch and the fedora development branch.

Running and debugging the unit test suite is discussed in the test suite section below and results are maintained in the test suite results table.

You will run into situations where the Apache Hadoop source needs to be patched to handle the Fedora version of a dependency. Those patches are candidates to propose upstream, are tracked in the upstream patch tracking table, and maintained in the source repositories below. Any changes that are required to conform to Fedora's packaging guidelines or deal with a package naming issue should be contained to the hadoop spec file.

In handling patches, the intention of this process is to isolate changes to a single dependency so patches can be created that can be consumed upstream. It is important that changes to the source be isolated to 1 dependency and the changes must be self-contained. A dependency is not necessarily a single jar file. Changes to a dependency should entail everything needed to use the jar files from a later release of the dependency.

Source repositories:

https://github.com/fedora-bigdata/hadoop-common Fork of Apache Hadoop for changes required to support compilation on Fedora
https://github.com/fedora-bigdata/hadoop-rpm Spec and supporting files for generating an RPM for Fedora

Dependency Branches

All code/build changes to Apache Hadoop should be performed on a branch in the hadoop-common repo that should be based off the

branch-2.2.0

branch and should following this naming convention:

fedora-patch-<dependency>

Where <dependency> is the name of the dependency being worked on. Changes to this branch should ONLY relate to the dependency being worked on. Do not include the dependency version in the branch name. These branches will be updated as needed because of Fedora or Hadoop updates until they are accepted upstream by Apache Hadoop. Not having the dependency version allows the branch to move from version 1->2->3 without confusion if it is required before accepted upstream.

Integration Branch

An integration branch should be created in the hadoop-common repository that corresponds with the release version being packaged using the following naming convention:

fedora-<version>-integration

where <ver> is the hadoop version being packaged. All branches containing changes that have not yet been accepted upstream should be merged to the integration branch and the result should pass the build and all tests. Once this is complete a patch should be generated and pushed to the hadoop-rpm repository.

Test suite

In order to attempt to run any part of the test suite, you must first build the components (F20):

git clone git://github.com/fedora-bigdata/hadoop-common.git
cd hadoop-common
git checkout -b  fedora-2.2.0-test origin/fedora-2.2.0-test
xmvn -B -o -Pdist,native -DskipTest -DskipTests -DskipIT install

If you are interested in the whole ball of wax then

xmvn -B -o -X -Dorg.apache.jasper.compiler.disablejsr199=true^[1] test

and go mow a football field or knit a sweater. Note that this could still result in spurious failures. Add -Dmaven.test.failure.ignore=true to the above line if you're seeking just test errors.

The fedora-2.2.0-test branch excludes identified consistently failing tests. You can edit your copy of hadoop-project/pom.xml to bring any of them back into play.

If you are interested in investigating specific failures such as active ones from the table above then target the module, test class, and even method as you see fit:

xmvn -B -o -X -pl :hadoop-common test -Dtest=TestSSLHttpServer#testEcho

All your hard work results in a patch? Great! Hit a contributor up with it and we'll review and apply if everything looks cool.

This option is required to ensure the test of TestHttpServer#testContentTypes passes due to the use of glassfish JSP support.

Dependencies

Missing dependency legend
State	Notes
Available	free for someone to take
Active	dependency is actively being packaged if missing, or patch is being developed or tested for inclusion in hadoop-common build
Blocked	pending packages for dependencies
Review	under review, include link to review BZ
Complete	woohoo!

Missing Dependencies
Project	State	Review BZ	Packager	Notes
hadoop	Complete	RHBZ #985087	rrati,pmackinn
bookkeeper	Complete	RHBZ #948589	gil	Version 4.0 requested. packaged 4.2.1. Patch: BOOKKEEPER-598
glassfish-gmbal	Complete	RHBZ #859112	gil	F18 build
glassfish-management-api	Complete	RHBZ #859110	gil	F18 build
grizzly	Complete	RHBZ #859114	gil	Only for F20 for now. Cause: missing glassfish-servlet-api on F18 and F19.
groovy	Complete	RHBZ #858127	gil	1.5 requested but 1.8 packaged in fedora. Possible moving forward 1.8 series will be known as groovy18 and groovy will be 2.x.
jersey - jersey1	Complete	RHBZ #825347 RHBZ #1223831	gil	F18 build Should be rebuilt with grizzly2 support enabled.
jets3t	Complete	RHBZ #847109	gil
jspc-compiler	Complete	RHBZ #960720	pmackinn	Passes preliminary overall hadoop-common compilation/testing.
~~kfs~~	~~Review~~	~~RHBZ #960728~~	~~pmackinn~~	~~kfs has become Quantcast qfs.~~ No longer a dependency in 2.0.5-beta & >
maven-native	Complete	RHBZ #864084	gil	Needs patch to build with java7. NOTE: javac target/source is already set by mojo.java.target option
zookeeper	Complete	RHBZ #823122	gil	requires jtoaster

Broken Dependencies
Project	Packager	Notes
ant		Version 1.6 requested, 1.8 currently packaged in Fedora. Needs to be inspected for API/functional incompatibilities(?)
apache-commons-collections	pmackinn	Java import compilation error with existing package. Patches for hadoop-common being tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-collections
apache-commons-math	pmackinn	Current apache-commons-math uses math3 in pom instead of math, and API changes in code. Patches for hadoop-common being tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-math
cglib	pmackinn	Missing an explicit dep which old dep chain didn't need.. Patches for hadoop-common being tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-cglib
ecj	rrati	Need ecj version ecj-4.2.1-6 or later to resolve a dependency lookup issue
gmaven	gil	Version 1.0 requested, available 1.4 (but has broken deps) RHBZ #914056
hadoop-hdfs	pmackinn	glibc link error in hdfs native build. Patch for hadoop-common being tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-cmake-hdfs
hsqldb	tradej	1.8 in fedora, update to 2.2.9 in the process. API compatibility to be checked.
jersey - jersey1	pmackinn	Needs jersey-servlet and version. Tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-jersey NOTE: should be specified the jersey1 version (1 or 1.19)
jets3t	pmackinn	Requires 0.6.1. With 0.9.x: hadoop-common Jets3tNativeFileSystemStore.java error: incompatible types S3ObjectsChunk chunk = s3Service.listObjectsChunked(bucket.getName(). Patches for hadoop-common being tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-jets3t
jetty	rrati	jetty8 packaged in Fedora, but 6.x requested. 6 and 8 are incompatible. Patches tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-jetty
slf4j	pmackinn	Package in fedora fails to match in dependency resolution. jcl104-over-slf4j dep in hadoop-common moved to jcl-over-slf4j as part of jspc/tomcat dep. Patch being tracked at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-jasper
tomcat-jasper	pmackinn	Version 5.5.x requested. Adaptations made for incumbent Tomcat 7 via patches at https://github.com/fedora-bigdata/hadoop-common/tree/fedora-patch-jasper. Reviewing fit as part of overall hadoop-common compilation/testing.

Test suite results

Based on Fedora 2.2.0 test branch

[INFO] Apache Hadoop Auth ................................ SUCCESS
Tests run: 48, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop Common .............................. FAILURE
Failed tests: 
  TestHttpServer.testContentTypes:264->Assert.assertEquals:542->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 expected:<200> but was:<500>
  TestDoAsEffectiveUser.testRealUserGroupNotSpecified:356 The RPC must have failed proxyUser (auth:PROXY) via realUser1@HADOOP.APACHE.ORG (auth:SIMPLE)
  TestDoAsEffectiveUser.testRealUserAuthorizationSuccess:229 null
  TestRPC.testStopsAllThreads:777 Expect no Reader threads running before test expected:<0> but was:<1>
Tests run: 2015, Failures: 4, Errors: 0, Skipped: 64

[INFO] Apache Hadoop NFS ................................. SUCCESS
Tests run: 46, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop HDFS ................................ FAILURE
Failed tests: 
  TestBlockReport.blockReport_08:466 Wrong number of PendingReplication blocks expected:<2> but was:<1>
  TestEditLogRace.testSaveRightBeforeSync:512 null
  TestSnapshotPathINodes.testNonSnapshotPathINodes:149->assertSnapshot:124 expected:<null> but was:<Snapshot.s1(id=1)>
  TestSnapshotPathINodes.testSnapshotPathINodesAfterModification:389->assertSnapshot:124 expected:<Snapshot.s4(id=0)> but was:<Snapshot.s1(id=1)>
  TestRBWBlockInvalidation.testBlockInvalidationWhenRBWReplicaMissedInDN:116 There should be 1 replica in the corruptReplicasMap expected:<1> but was:<0>
  TestReplicationPolicyWithNodeGroup.testRereplicate1:409 null
  TestReplicationPolicyWithNodeGroup.testRereplicate3:470 expected:<0> but was:<1>
  TestFileStatus.testGetFileStatusOnNonExistantFileDir:182 listStatus of non-existent path should fail
Tests in error: 
  TestTransferFsImage.testImageTransferTimeout:134->Object.wait:-2 »  test timed...
Tests run: 2040, Failures: 8, Errors: 1, Skipped: 7

[INFO] Apache Hadoop HttpFS .............................. SUCCESS
Tests run: 286, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS
Tests run: 32, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop HDFS-NFS ............................ FAILURE
Failed tests: 
  TestDFSClientCache.testEviction:48 null
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0

[INFO] hadoop-yarn-common ................................ SUCCESS
Tests run: 123, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-yarn-server-common ......................... SUCCESS
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-yarn-server-nodemanager .................... FAILURE
Failed tests: 
  TestLocalResourcesTrackerImpl.testLocalResourceCache:302 
Wanted but not invoked:
eventHandler.handle(
    isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceFailedEvent)
);
-> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:302)
Actually, there were zero interactions with this mock.
  TestLogAggregationService.testLocalFileDeletionAfterUpload:201 check /home/pmackinn/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/TestLogAggregationService-localLogDir/application_1234_0001/container_1234_0001_01_000001/stdout
Tests run: 193, Failures: 2, Errors: 0, Skipped: 1

[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-yarn-server-resourcemanager ................ FAILURE
Failed tests: 
  TestRMWebServicesApps.testAppsQueryStatesNone:360 apps is not null expected:<null> but was:<{}>
  TestFifoScheduler.testAppAttemptMetrics:153 expected:<1> but was:<3>
Tests in error: 
  TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
  TestAMAuthorization.testUnauthorizedAccess:284 NullPointer
  TestClientRMTokens.testShortCircuitRenewCancel:240->checkShortCircuitRenewCancel:306 » Runtime
  TestClientRMTokens.testShortCircuitRenewCancelWildcardAddress:247->checkShortCircuitRenewCancel:306 » NullPointer
  TestRMWebServicesFairScheduler.<init>:80->JerseyTest.<init>:220->JerseyTest.getContainer:345 » IllegalState
  TestRMWebServicesFairScheduler.<init>:80->JerseyTest.<init>:220->JerseyTest.getContainer:345 » IllegalState
Tests run: 443, Failures: 2, Errors: 6, Skipped: 1

[INFO] hadoop-yarn-server-tests .......................... SUCCESS
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-yarn-client ................................ FAILURE

Failed tests: 
  TestNMClient.testNMClientNoCleanupOnStop:199->testContainerManagement:336->testGetContainerStatus:374 null
Tests in error: 
  TestGetGroups>GetGroupsTestBase.testMultipleNonExistingUsers:85->GetGroupsTestBase.runTool:119 » UnknownHost
  TestGetGroups>GetGroupsTestBase.testExistingInterleavedWithNonExistentUsers:95->GetGroupsTestBase.runTool:119 » UnknownHost
  TestGetGroups>GetGroupsTestBase.testNoUserGiven:53->GetGroupsTestBase.runTool:119 » UnknownHost
  TestGetGroups>GetGroupsTestBase.testNonExistentUser:76->GetGroupsTestBase.runTool:119 » UnknownHost
  TestGetGroups>GetGroupsTestBase.testExistingUser:61->GetGroupsTestBase.runTool:119 » UnknownHost
Tests run: 52, Failures: 1, Errors: 5, Skipped: 0

[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-mapreduce-client-core ...................... SUCCESS
Tests run: 94, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-mapreduce-client-common .................... FAILURE
Tests in error: 
  TestJobClient.testIsJobDirValid:57 »  test timed out after 1000 milliseconds
Tests run: 44, Failures: 0, Errors: 1, Skipped: 0

[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0

[INFO] hadoop-mapreduce-client-app ....................... FAILURE
Failed tests: 
  TestFetchFailure.testFetchFailureMultipleReduces:332 expected:<SUCCEEDED> but was:<RUNNING>
  TestAMWebServicesAttempts.testTaskAttemptsDefault:172->verifyAMTaskAttempts:462->verifyAMTaskAttempt:433->verifyTaskAttemptGeneric:498 type doesn't match, got: ["reduceTaskAttemptInfo","REDUCE"] expected: REDUCE
  TestAMWebServicesAttempts.testTaskAttempts:134->verifyAMTaskAttempts:462->verifyAMTaskAttempt:433->verifyTaskAttemptGeneric:498 type doesn't match, got: ["reduceTaskAttemptInfo","REDUCE"] expected: REDUCE
  TestAMWebServicesAttempts.testTaskAttemptsSlash:153->verifyAMTaskAttempts:462->verifyAMTaskAttempt:433->verifyTaskAttemptGeneric:498 type doesn't match, got: ["reduceTaskAttemptInfo","REDUCE"] expected: REDUCE
Tests run: 229, Failures: 4, Errors: 0, Skipped: 0

[INFO] hadoop-mapreduce-client-hs ........................ FAILURE
Failed tests: 
  TestHsWebServicesAttempts.testTaskAttemptsDefault:185->verifyHsTaskAttempts:481->verifyHsTaskAttempt:452->verifyTaskAttemptGeneric:517 type doesn't match, got: ["reduceTaskAttemptInfo","REDUCE"] expected: REDUCE
  TestHsWebServicesAttempts.testTaskAttempts:146->verifyHsTaskAttempts:481->verifyHsTaskAttempt:452->verifyTaskAttemptGeneric:517 type doesn't match, got: ["reduceTaskAttemptInfo","REDUCE"] expected: REDUCE
  TestHsWebServicesAttempts.testTaskAttemptsSlash:166->verifyHsTaskAttempts:481->verifyHsTaskAttempt:452->verifyTaskAttemptGeneric:517 type doesn't match, got: ["reduceTaskAttemptInfo","REDUCE"] expected: REDUCE
Tests run: 141, Failures: 3, Errors: 0, Skipped: 0

[INFO] hadoop-mapreduce-client-jobclient ................. FAILURE
Failed tests: 
  TestMRJobs.testFailingMapper:313 expected:<TIPFAILED> but was:<FAILED>
  TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithoutUpdateEvents:118 Couldn't speculate successfully
  TestUberAM.testFailingMapper:141 expected:<TIPFAILED> but was:<FAILED>
  TestMapReduceJobControl.testJobControlWithFailJob:134 null
  TestUserDefinedCounters.testMapReduceJob:113 null
  TestMiniMRClientCluster.testJob:162 null
  TestMiniMRClientCluster.testRestart:146 Address before restart: localhost.localdomain:0 is different from new address: localhost:37400 expected:<localhost[.localdomain:]0> but was:<localhost[:3740]0>
  TestMiniMRChildTask.testTaskEnv:386->runTestTaskEnv:463 The environment checker job failed.
  TestMiniMRChildTask.testTaskTempDir:363->launchTest:169 null
  TestMiniMRChildTask.testTaskOldEnv:409->runTestTaskEnv:463 The environment checker job failed.
Tests in error: 
  TestMapReduceLazyOutput.testLazyOutput » Remote File /testlazy/input/text0.txt...
  TestLazyOutput.testLazyOutput » Remote File /testlazy/input/text0.txt could on...
  TestTextOutputFormat.testCompress:200 » FileNotFound /home/pmackinn/hadoop-com...
  TestJobCleanup.testCustomCleanup:319->testFailedJob:199 NullPointer
  TestJobCleanup.testDefaultCleanupAndAbort:275->testFailedJob:199 NullPointer
  TestJobCleanup.testCustomAbort:296->testFailedJob:199 NullPointer
  TestClusterMapReduceTestCase.testMapReduceRestarting:93->_testMapReduce:67 » IO
  TestClusterMapReduceTestCase.testMapReduce:89->_testMapReduce:67 » IO Job fail...
  TestMiniMRWithDFSWithDistinctUsers.setUp:97 » YarnRuntime java.lang.OutOfMemor...
  TestMiniMRWithDFSWithDistinctUsers.setUp:97 » YarnRuntime java.lang.OutOfMemor...
  TestMerge.testMerge:86->runMergeTest:145->verifyOutput:156 » FileNotFound File...
  TestJobName.testComplexNameWithRegex:89 » IO Job failed!
  TestJobName.testComplexName:55 » IO Job failed!
  TestFileInputFormat.testLocality:62->createInputs:103 » OutOfMemory unable to ...
  TestFileInputFormat.testNumInputs:115->newDFSCluster:47 » OutOfMemory unable t...
Tests run: 378, Failures: 10, Errors: 15, Skipped: 11

[INFO] hadoop-mapreduce-client-hs-plugins ................ SUCCESS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS
Tests run: 11, Failures: 0, Errors: 0, Skipped: 1

[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS
Tests run: 56, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop Distributed Copy .................... FAILURE
Failed tests: 
  TestCopyCommitter.testNoCommitAction:122 Commit failed
Tests in error: 
  TestCopyMapper.testCopyFailOnBlockSizeDifference:706 NullPointer
Tests run: 113, Failures: 1, Errors: 1, Skipped: 0

[INFO] Apache Hadoop Archives ............................ SUCCESS
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop Rumen ............................... SUCCESS
Tests run: 3, Failures: 0, Errors: 0, Skipped: 1

[INFO] Apache Hadoop Gridmix ............................. FAILURE
Tests in error: 
  TestGridMixClasses.testLoadJobLoadRecordReader:627 »  test timed out after 100...
Tests run: 74, Failures: 0, Errors: 1, Skipped: 0

[INFO] Apache Hadoop Data Join ........................... SUCCESS
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO] Apache Hadoop Extras .............................. SUCCESS
Tests run: 20, Failures: 0, Errors: 0, Skipped: 1

Tests are listed in the order of execution
Baseline: F18, maven 3.0.5, Oracle JDK 1.6u45

Upstream patch tracking

Currenly tracking against branch-2 @ https://github.com/timothysc/hadoop-common

Modification Tracking
Branch	Commiter	JIRA	Target	Status
fedora-patch-math	pmackinn	https://issues.apache.org/jira/browse/HADOOP-9594	2.1.0-beta	PENDING REVIEW
~~fedora-patch-junit~~	pmackinn	https://issues.apache.org/jira/browse/HADOOP-9605	2.0.5-alpha	COMMITTED
~~fedora-patch-javadocs~~	rsquared	https://issues.apache.org/jira/browse/HADOOP-9607	2.0.5-alpha	COMMITTED
fedora-patch-collections	pmackinn	https://issues.apache.org/jira/browse/HADOOP-9610	2.1.0-beta	PENDING REVIEW
fedora-patch-cglib	pmackinn	https://issues.apache.org/jira/browse/HADOOP-9611	2.1.0-beta	PENDING REVIEW
fedora-patch-jersey	pmackinn	https://issues.apache.org/jira/browse/HADOOP-9613	2.1.0-beta	PENDING REVIEW
fedora-patch-jets3t	pmackinn	~~https://issues.apache.org/jira/browse/HADOOP-9623~~ https://issues.apache.org/jira/browse/HADOOP-9680	2.1.0-beta	PENDING REVIEW
fedora-patch-jetty	pmackinn	https://issues.apache.org/jira/browse/HADOOP-9650	2.1.0-beta	RE-EVAL DUE TO F19 jetty-9
fedora-patch-jasper	pmackinn	https://lists.fedoraproject.org/pipermail/bigdata/2013-June/000026.html	N/A	Carrying Patch Until Further Notice
~~fedora-patch-cmake-hdfs~~	~~pmackinn~~	~~N/A~~	2.0.5-alpha	Already Modified Upstream

Packager Resources

Packager tips

mvn-rpmbuild utility will ONLY resolve from system repo
mvn-local will resolve from system repo first then fallback to maven if unresolved
- can be used to find the delta between system repo packages available and missing dependencies that can be viewed in the .m2 local maven repo (find ~/.m2/repository -name '*.jar')
-Dmaven.local.debug=true
- reveals how JPP lookups are executed per dependency: useful for finding groupId,artifactId mismatches
-Dmaven.test.skip=true
- tells maven to skip test runs AND compilation
- useful for unblocking end-to-end build

An alternative to gmaven:

apply a patch with the following content where required
test support is not guaranteed, should not work.

     <plugin>
       <groupId>org.apache.maven.plugins</groupId>
       <artifactId>maven-antrun-plugin</artifactId>
       <version>1.7</version>
       <dependencies>
         <dependency>
           <groupId>org.codehaus.groovy</groupId>
           <artifactId>groovy</artifactId>
           <version>any</version>
         </dependency>
         <dependency>
           <groupId>antlr</groupId>
           <artifactId>antlr</artifactId>
           <version>any</version>
         </dependency>
         <dependency>
           <groupId>commons-cli</groupId>
           <artifactId>commons-cli</artifactId>
           <version>any</version>
         </dependency>
         <dependency>
           <groupId>asm</groupId>
           <artifactId>asm-all</artifactId>
           <version>any</version>
         </dependency>
         <dependency>
           <groupId>org.slf4j</groupId>
           <artifactId>slf4j-nop</artifactId>
           <version>any</version>
         </dependency>
       </dependencies>
       <executions>
         <execution>
           <id>compile</id>
           <phase>process-sources</phase>
           <configuration>
             <target>
               <mkdir dir="${basedir}/target/classes"/>
               <taskdef name="groovyc" classname="org.codehaus.groovy.ant.Groovyc">
                 <classpath refid="maven.plugin.classpath"/>
               </taskdef>
               <groovyc destdir="${project.build.outputDirectory}" srcdir="${basedir}/src/main" classpathref="maven.compile.classpath">
                 <javac source="1.5" target="1.5" debug="on"/>
               </groovyc>
             </target>
           </configuration>
           <goals>
             <goal>run</goal>
           </goals>
         </execution>
       </executions>
     </plugin>

Now is available GMavenPlus (a rewrite of GMaven)

     <plugin>
       <groupId>org.codehaus.gmavenplus</groupId>
       <artifactId>gmavenplus-plugin</artifactId>
       <version>1.5</version>
       <executions>
         <execution>
           <goals>
             <goal>generateStubs</goal>
             <goal>compile</goal>
             <goal>testCompile</goal>
           </goals>
         </execution>
       </executions>
     </plugin>

Other examples for use of this Maven plugin

YUM repositories

An RPM repository of dependencies already packaged and in, or heading towards, review state can be found here:

http://repos.fedorapeople.org/repos/rrati/hadoop/

Currently, only Fedora 18/19 x86_64 packages are available

Search

Changes/Hadoop

Contents

Apache Hadoop 2.x

Summary

Owner

Current status

Detailed Description

Benefit to Fedora

Scope

Upgrade/compatibility impact

How To Test

User Experience

Dependencies

Contingency Plan

Documentation

Release Notes

Effort details

People involved

Detailed status

Approach

Dependency Branches

Integration Branch

Test suite

Dependencies

Test suite results

Upstream patch tracking

Packager Resources

Packager tips

YUM repositories