From Fedora Project Wiki
Line 166: Line 166:
  
 
<pre>
 
<pre>
$ rpm -q --requires ffmpeg-libs | egrep "cuda|cuvid"
+
$ rpm -q --requires ffmpeg-libs.i686 | egrep "cuda|cuvid"
libcuda.so.1()(64bit)
+
libcuda.so.1()
libnvcuvid.so.1()(64bit)
+
libnvcuvid.so.1()
 
</pre>
 
</pre>
  

Revision as of 13:42, 2 February 2017

Build as much as possible from sources

There was a discussion about putting everything in Fedora where possible, that is:

  • nvidia-settings
  • nvidia-xconfig
  • nvidia-persistenced
  • egl-wayland
  • libglvnd

This for a few reasons:

  • Build options (optimizations, no GTK 2 on Fedora, no GTK 3 on RHEL 6).
  • Avoiding having multiple "Gnome software entries" for the various drivers, Richard Hughes asked me to break the dependency between the driver and nvidia-settings. I guess we can still have the main driver package requiring the nvidia-settings control panel though. So we have "free" components in Fedora not requiring non-free components.
  • Easier to maintain, we can just patch / update each component without providing an entire new driver package.
  • Some part of it are already following this pattern (egl-wayland for example).

This of course would not play well with this:

https://rpmfusion.org/Howto/nVidia?highlight=%28CategoryHowto%29#Latest.2FBeta_driver

As the source built components would be tied to specific library versions in the distribution. What we could do, is follow this pattern (in order of preference):

  • EPEL: Long lived release
  • Fedora: Short lived release, Long lived release
  • Rawhide: Beta, Short lived release, Long lived release

This way you would reiterate the basic targets of the distributions, so slowly changin target for EPEL and fast pace for Fedora.

So with the above in mind, an example with fake numbers (there is no short lived release at the moment):

  • EPEL 6: 370.xx
  • EPEL 7: 370.xx
  • Fedora 24: 375.xx
  • Fedora 25: 375.xx
  • Fedora rawhide: 378.xx

If 378 becomes a short lived release it gets promoted to main Fedora, etc. This also gives enough time to support new features (for example the latest egl external platform) without addressing quickly the new hardware support.

Source tarballs

If we consider the source building above, we are actually ignoring a lot of things in the driver makeself archive:

  • Non-GLVND GL libraries (useless)
  • libglvnd libraries (built from source in main Fedora)
  • nvidia-settings (built from source in main Fedora)
  • nvidia-modprobe (useless)
  • nvidia-installer (useless)
  • nvidia-persistenced (built from source in main Fedora)
  • libvdpau (built from source in main Fedora)
  • old libraries - TLS, wfb, etc. (useless)

This actually brings down the size of the tarball to almost 50% of the original size. I understand the use of the kmodsrc subpackage in RPMFusion, but since we're alreading trashing 50% of the tarball, then maybe we could regenerate the tarball itself and have separate archives for kernel and user space components. Fedora packaging guidelines state that you can regenerate the tarball from upstream sources if required.

An example: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-generate-tarballs.sh

I understand the additional work involved if you have one or two extra tarball to udpate and that's what the kmodsrc is trying to address, but the update would really be:

rpmdev-bumpspec -c "Update to XXX." -n XXXX <specfile>
fedpkg new-source <tarball>

I don't see much of work here. This split would bring the following benefits:

  • Smaller tarballs, more than 50% reduction in size in the src.rpm, so faster build times and uploads in Koji
  • Treating the kernel source module as a completely separate package allows you to update it with it's own versioning/numbering. This helps both when updating the kernel module (kernel patch for example) but also when updating the main driver package as an update there would not trigger a rebuild of the kernel, etc.
  • An additional kmodsrc can be avoided.

Actually, we could also further reduce the kernel module size source tarball the way that Debian does:

--- a/nvidia/nvidia.Kbuild
+++ b/nvidia/nvidia.Kbuild
@@ -37,7 +37,11 @@ NVIDIA_KO = nvidia/nvidia.ko
 # and needs to be re-executed.
 #
 
-NVIDIA_BINARY_OBJECT := $(src)/nvidia/nv-kernel.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_X86_32)	+= nv-kernel-i386.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_X86_64)	+= nv-kernel-amd64.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_ARM)	+= nv-kernel-armhf.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_PPC64)	+= nv-kernel-ppc64el.o_binary
+NVIDIA_BINARY_OBJECT := $(src)/nvidia/$(NVIDIA_BINARY_OBJECT-y)
 NVIDIA_BINARY_OBJECT_O := nvidia/nv-kernel.o
 
 quiet_cmd_symlink = SYMLINK $@
--- a/nvidia-modeset/nvidia-modeset.Kbuild
+++ b/nvidia-modeset/nvidia-modeset.Kbuild
@@ -35,7 +35,11 @@ NV_KERNEL_MODULE_TARGETS += $(NVIDIA_MOD
 # But, the target for the symlink rule should be prepended with $(obj).
 #
 
-NVIDIA_MODESET_BINARY_OBJECT := $(src)/nvidia-modeset/nv-modeset-kernel.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_X86_32)	+= nv-modeset-kernel-i386.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_X86_64)	+= nv-modeset-kernel-amd64.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_ARM)	+= nv-modeset-kernel-armhf.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_PPC64)	+= nv-modeset-kernel-ppc64el.o_binary
+NVIDIA_MODESET_BINARY_OBJECT := $(src)/nvidia-modeset/$(NVIDIA_MODESET_BINARY_OBJECT-y)
 NVIDIA_MODESET_BINARY_OBJECT_O := nvidia-modeset/nv-modeset-kernel.o
 
 quiet_cmd_symlink = SYMLINK $@

This way we would have one tarball, a few binary objects and one patch.

Use ldconfig to create symlinks for the libraries while building

By doing this, you can avoid links that are actually not required. Many libraries require the other ones just with the full driver version in the name. For example:

$ rpm -ql nvidia-driver-libs.x86_64
/usr/lib64/libEGL_nvidia.so.0
/usr/lib64/libEGL_nvidia.so.375.26
/usr/lib64/libGLESv1_CM_nvidia.so.1
/usr/lib64/libGLESv1_CM_nvidia.so.375.26
/usr/lib64/libGLESv2_nvidia.so.2
/usr/lib64/libGLESv2_nvidia.so.375.26
/usr/lib64/libGLX_indirect.so.0
/usr/lib64/libGLX_nvidia.so.0
/usr/lib64/libGLX_nvidia.so.375.26
/usr/lib64/libnvidia-cfg.so.1
/usr/lib64/libnvidia-cfg.so.375.26
/usr/lib64/libnvidia-egl-wayland.so.375.26
/usr/lib64/libnvidia-eglcore.so.375.26
/usr/lib64/libnvidia-glcore.so.375.26
/usr/lib64/libnvidia-glsi.so.375.26
/usr/lib64/libnvidia-tls.so.375.26
/usr/lib64/vdpau/libvdpau_nvidia.so.1
/usr/lib64/vdpau/libvdpau_nvidia.so.375.26
/usr/share/glvnd/egl_vendor.d/10_nvidia.json

Example: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-driver.spec#L214-L215

Which then get packed in the files section: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-driver.spec#L433-L470

In RPMFusion there are symlinks for every library even if they are not actually reflecting the SONAME:

https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n222-n236 https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n277

Which are then packaged: https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n535-n591

Nothing special, just that there are some additional links inside the package which are actually never used and that even a normal ldconfig would not create as the shared object names are different.

Split out CUDA libraries from the main CUDA package

Basically create xorg-x11-drv-nvidia-cuda-libs with the CUDA libraries. I think it's quite important.

The reasoning behind this, is that if you want to provide a program that requires any of the Nvidia library (when you are not allowed to DLopen them at runtime) you require to provide the libraries on a system, and having the driver part + the libraries together, in the end pulls in all the driver CUDA components.

This is something that you might want to avoid on a non Nvidia system, and It's also handy to create subpackages for adding support. I know this is not allowed from policies and might not apply to everything, but as an example we have at least the hardware encoding support in Steam In-Home streaming in RPMFusiont that could leverage this.

For example, one user could also create separate packages requiring the functionality:

$ rpm -q --requires steam | egrep cuda
xorg-x11-drv-cuda-libs(x86-32)

Basically Steam needs the 32 bit variant of libnvidia-encode.so.1 (NVENC) on the system before you can use the accelerated hardware encoding. At the moment, this can't be done in RPMFusion as you would need to install the full 32 bit Nvidia driver package even on a 64 bit system.

Also, this way we could also ship things "built" with NVENC support and not forcing anyone into having the full blown driver installed but just the libraries, pretty much like every other codec/format combination.

$ rpm -q --requires ffmpeg-libs.i686 | egrep "cuda|cuvid"
libcuda.so.1()
libnvcuvid.so.1()

Again, FFmpeg libs (32 bit) will pull in just the libraries required and not the full blown driver.

$ rpm -q --requires gstreamer1-plugins-bad-nvenc | grep nvidia
libnvidia-encode.so.1()(64bit)

The libnvcuvid.so.1 library comes from the CUDA libraries part of the Nvidia driver.

This is also helpful if you want to compile stuff for CUDA without having an Nvidia card on the system. For example, you install CUDA on an Intel-only system with the CUDA libraries from the driver and can build CUDA kernels built for Blender. This way on an Intel system you can build CUDA support for running Blender on an Nvidia powered system.

Blender in particular DLopens libcuda and libnvrtc if available. The CUDA kernels in that subpackage are actually only needed to be installed with CUDA, so you can create a subpackage with the hard dependency:

$ rpm -q --requires blender-cuda | grep cuda
cuda-nvrtc
xog-x11-drv-cuda-libs(x86-64)
Note.png
Blender packages in Fedora
I've updated Blender in Fedora, CUDA can be enabled without problems just by adding the <blender-cuda> package along with the official Blender packages in Fedora. .

Preloading nvidia-uvm

nvidia-uvm is required for CUDA stuff. This needs to load at boot if you want CUDA support, so it actually gets pre-loaded when installing the CUDA subpackage (again, this is something that happens only when you want CUDA support, so the above thinking for the libraries applies).

There's a caveat though, you can't just preload it hardly in the modprobe configuration. You need to post load it if the CUDA support is wanted or it generates problems with the rebuilding of the initird.

$ cat /usr/lib/modprobe.d/nvidia-uvm.conf 
# Make a soft dependency for nvidia-uvm as adding the module loading to
# /usr/lib/modules-load.d/nvidia-uvm.conf for systemd consumption, makes the
# configuration file to be added to the initrd but not the module, throwing an
# error on plymouth about not being able to find the module.
# Ref: /usr/lib/dracut/modules.d/00systemd/module-setup.sh

# Even adding the module is not the correct thing, as we don't want it to be
# included in the initrd, so use this configuration file to specify the
# dependency.

softdep nvidia post: nvidia-uvm

There's also the nvidia-modprobe command in the Nvidia drivers that does the same, but that's a SETUID binary that just forcefully loads the module in the running kernel and sets the required permissions for the user. By using the above snippet, we can avoid that.

DKMS kernel modules

Actually just providing this was requested by Hans. I guess we can have both (akmods & DKMS) in RPMFusion. DKMS is used by most people on RHEL; it's the default from the ZFS project, by DELL with updated drivers and by other vendors, including Nvidia in the default makeself archive. I think it would be good to have, even if it would not be the recommended and advertised solution.

Again, regenerating the tarballs as in previous points or just using the kmodsrc subpackage, we can enable this "variant" very easily.

I also have access to the upstream DKMS repository, in case we need to fix/change something quickly.

Kernel tool

Regarding binary kernel modules (kmods) we need to adjust a couple of things (this was also another set of emails between Hans and me):

What we could do now, is remove the first two kmodtool copies entirely, move the RPMFusion kmodtool to Fedora, and update it to generate also RHEL kABI modules.

This way we will have one single kmodtool, which can then be used for akmods and kmods for Fedora and kABI kmods for RHEL, all in the split RPMFusion repository.

Obsolete stuff

We can probably remove all the GRUB 1 stuff, all the Group tags, etc. For RHEL, upgrades are not supported and there will be no RHEL 5 support soon, for Fedora, I doubt anyone has still grub 1 as their bootloader. Removing RHEL 5 support also means removing libnvidia-wfb, old tls libraries, etc.

This also ties with the source tarball point above.

Default SLI enablement

We can enable SLI in the new OutputClass configuration, I've discovered that it just works if you put it in the config, including modeset=1 in nvidia-drm. In case of non-SLI systems, you just get a line in Xorg.log/journal saying that the system does not contain multiple cards.

https://github.com/negativo17/nvidia-driver/commit/64c48422115f26bef904d280a8c1bcfd836536aa

I have a SLI system at home to test if needed.

RPM filters

Now that all the libnvidia* and GL libraries are no longer included in the RPMFusion packages, we can probably have simpler filters for the RPM libraries, just basically filter out %{_libdir}/nvidia and this will filter out OpenCL and anything that's left. Then, all the eventual packages requiring Nvidia libraries can just use the automatic provider mechanisms of RPM.

Non GL enabled installations of the driver

This is something I have been facing with the CUDA installations at the University I'm helping, there are people with Intel GPU systems and Tesla/GeForce GPUs in the system just for calculation or without display at all. This is addressed by the Nvidia makeself installation parameter through --no-opengl-libs.

The installer will basically install all the driver components without all the GL stuff, the GLX module, the X config, etc.

I haven't tackled this yet due to time constraints, but I guess that for this we could simply generate different xorg-x11-drv-nvidia and xorg-x11-drv-nvidia-libs subpackages (with a different name of course) that conflict with the base ones without all the non-needed stuff in it. This was also one of the reasons why I did not choose a base package name that would start with xorg-x11-drv, but we can work around it.