Archive:PackagingDrafts/PrebuiltBinaryCheck

= No building with Prebuilt Binaries =

Goal
Do not allow any file from the tarball that could be hiding a Trojan to be used in the build process. This means that scripts are okay (since you can read them to see what they're doing) but binaries, bytecode, and other things that are not intended to be read by humans are not.

Reasons
1) If the binary is or is used by (for instance, a library) something that builds the package, we can be subject to Trojans which no amount of examining the application's source code will reveal (because the Trojan was created by the build tool, not the application the build tool is creating)

2) It makes it easier to audit packages for shipping prebuilt source if you know that only packages which explicitly turned off removal of prebuilt sources in the first place will have prebuilt binaries at all. This would be used in initial bootstrapping and few other places.

Implementation
rpm should have a script that runs the file command on every file in the untarred source tree. If the file type matches a blacklist of file types, it should be deleted. This should be done before we start building.

The packager should have a way to turn this off just like debuginfo generation can be turned off and find-provides can be overridden.

Gotchas: eggs and jars are just zip files. We'd need to unarchive those and examine what's inside to know what to do.

Blacklist
Example. We'll need regexes and a much bigger list to get things working.


 * PE32 executable for MS Windows (DLL) (console) Intel 80386 32-bit Mono/.Net assembly
 * ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped
 * python 2.5 byte-compiled