From Fedora Project Wiki
(Blanked the page)
Tag: Blanking
 
Line 1: Line 1:
== Build as much as possible from sources ==


There was a discussion about putting everything in Fedora where possible, that is:
* <code>nvidia-settings</code>
* <code>nvidia-xconfig</code>
* <code>nvidia-persistenced</code>
* <code>egl-wayland</code>
* <code>libglvnd</code>
This for a few reasons:
* Build options (optimizations, no GTK 2 on Fedora, no GTK 3 on RHEL 6).
* Avoiding having multiple "Gnome software entries" for the various drivers, Richard Hughes asked me to break the dependency between the driver and <code>nvidia-settings</code>. I guess we can still have the main driver package requiring the <code>nvidia-settings</code> control panel though. So we have "free" components in Fedora not requiring non-free components.
* Easier to maintain, we can just patch / update each component without providing an entire new driver package.
* Some part of it are already following this pattern (<code>egl-wayland</code> for example).
This of course would not play well with this:
https://rpmfusion.org/Howto/nVidia?highlight=%28CategoryHowto%29#Latest.2FBeta_driver
As the source built components would be tied to specific library versions in the distribution. What we could do, is follow this pattern (in order of preference):
* EPEL: Long lived release
* Fedora: Short lived release, Long lived release
* Rawhide: Beta, Short lived release, Long lived release
This way you would reiterate the basic targets of the distributions, so slowly changin target for EPEL and fast pace for Fedora.
So with the above in mind, an example with fake numbers (there is no short lived release at the moment):
* EPEL 6: 370.xx
* EPEL 7: 370.xx
* Fedora 24: 375.xx
* Fedora 25: 375.xx
* Fedora rawhide: 378.xx
If 378 becomes a short lived release it gets promoted to main Fedora, etc. This also gives enough time to support new features (for example the latest egl external platform) without addressing quickly the new hardware support.
{{admon/important|Legacy drivers|What about the 340.xx driver series?}}
== Source tarballs ==
If we consider the source building above, we are actually ignoring a lot of things in the driver makeself archive:
* Non-GLVND GL libraries (useless)
* <code>libglvnd</code> libraries (built from source in main Fedora)
* <code>nvidia-settings</code> (built from source in main Fedora)
* <code>nvidia-modprobe</code> (useless)
* <code>nvidia-installer</code> (useless)
* <code>nvidia-persistenced</code> (built from source in main Fedora)
* <code>libvdpau</code> (built from source in main Fedora)
* old libraries - TLS, wfb, etc. (useless)
This actually brings down the size of the tarball to almost 50% of the original size. I understand the use of the <code>kmodsrc</code> subpackage in RPMFusion, but since we're alreading trashing 50% of the tarball, then maybe we could regenerate the tarball itself and have separate archives for kernel and user space components. Fedora packaging guidelines state that you can regenerate the tarball from upstream sources if required.
An example: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-generate-tarballs.sh
I understand the additional work involved if you have one or two extra tarball to udpate and that's what the <code>kmodsrc</code> is trying to address, but the update would really be:
<pre>
rpmdev-bumpspec -c "Update to XXX." -n XXXX <specfile>
fedpkg new-source <tarball>
</pre>
I don't see much of work here. This split would bring the following benefits:
* Smaller tarballs, more than 50% reduction in size in the src.rpm, so faster build times and uploads in Koji
* Treating the kernel source module as a completely separate package allows you to update it with it's own versioning/numbering. This helps both when updating the kernel module (kernel patch for example) but also when updating the main driver package as an update there would not trigger a rebuild of the kernel, etc.
* An additional <code>kmodsrc</code> can be avoided.
Actually, we could also further reduce the kernel module size source tarball the way that Debian does:
<pre>
--- a/nvidia/nvidia.Kbuild
+++ b/nvidia/nvidia.Kbuild
@@ -37,7 +37,11 @@ NVIDIA_KO = nvidia/nvidia.ko
# and needs to be re-executed.
#
-NVIDIA_BINARY_OBJECT := $(src)/nvidia/nv-kernel.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_X86_32) += nv-kernel-i386.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_X86_64) += nv-kernel-amd64.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_ARM) += nv-kernel-armhf.o_binary
+NVIDIA_BINARY_OBJECT-$(CONFIG_PPC64) += nv-kernel-ppc64el.o_binary
+NVIDIA_BINARY_OBJECT := $(src)/nvidia/$(NVIDIA_BINARY_OBJECT-y)
NVIDIA_BINARY_OBJECT_O := nvidia/nv-kernel.o
quiet_cmd_symlink = SYMLINK $@
--- a/nvidia-modeset/nvidia-modeset.Kbuild
+++ b/nvidia-modeset/nvidia-modeset.Kbuild
@@ -35,7 +35,11 @@ NV_KERNEL_MODULE_TARGETS += $(NVIDIA_MOD
# But, the target for the symlink rule should be prepended with $(obj).
#
-NVIDIA_MODESET_BINARY_OBJECT := $(src)/nvidia-modeset/nv-modeset-kernel.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_X86_32) += nv-modeset-kernel-i386.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_X86_64) += nv-modeset-kernel-amd64.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_ARM) += nv-modeset-kernel-armhf.o_binary
+NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_PPC64) += nv-modeset-kernel-ppc64el.o_binary
+NVIDIA_MODESET_BINARY_OBJECT := $(src)/nvidia-modeset/$(NVIDIA_MODESET_BINARY_OBJECT-y)
NVIDIA_MODESET_BINARY_OBJECT_O := nvidia-modeset/nv-modeset-kernel.o
quiet_cmd_symlink = SYMLINK $@
</pre>
This way we would have one tarball, a few binary objects and one patch.
== Use <code>ldconfig</code> to create symlinks for the libraries while building ==
By doing this, you can avoid links that are actually not required. Many libraries require the other ones just with the full driver version in the name. For example:
<pre>
$ rpm -ql nvidia-driver-libs.x86_64
/usr/lib64/libEGL_nvidia.so.0
/usr/lib64/libEGL_nvidia.so.375.26
/usr/lib64/libGLESv1_CM_nvidia.so.1
/usr/lib64/libGLESv1_CM_nvidia.so.375.26
/usr/lib64/libGLESv2_nvidia.so.2
/usr/lib64/libGLESv2_nvidia.so.375.26
/usr/lib64/libGLX_indirect.so.0
/usr/lib64/libGLX_nvidia.so.0
/usr/lib64/libGLX_nvidia.so.375.26
/usr/lib64/libnvidia-cfg.so.1
/usr/lib64/libnvidia-cfg.so.375.26
/usr/lib64/libnvidia-egl-wayland.so.375.26
/usr/lib64/libnvidia-eglcore.so.375.26
/usr/lib64/libnvidia-glcore.so.375.26
/usr/lib64/libnvidia-glsi.so.375.26
/usr/lib64/libnvidia-tls.so.375.26
/usr/lib64/vdpau/libvdpau_nvidia.so.1
/usr/lib64/vdpau/libvdpau_nvidia.so.375.26
/usr/share/glvnd/egl_vendor.d/10_nvidia.json
</pre>
Example: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-driver.spec#L214-L215
Which then get packed in the files section: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-driver.spec#L433-L470
In RPMFusion there are symlinks for every library even if they are not actually reflecting the SONAME:
https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n222-n236
https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n277
Which are then packaged:
https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n535-n591
Nothing special, just that there are some additional links inside the package which are actually never used and that even a normal <code>ldconfig</code> would not create as the shared object names are different.
== Split out CUDA libraries from the main CUDA package ==
Basically create <code>xorg-x11-drv-nvidia-cuda-libs</code> with the CUDA libraries. I think it's quite important.
The reasoning behind this, is that if you want to provide a program that requires any of the Nvidia library (when you are not allowed to DLopen them at runtime) you require to provide the libraries on a system, and having the driver part + the libraries together, in the end pulls in all the driver CUDA components.
This is something that you might want to avoid on a non Nvidia system, and It's also handy to create subpackages for adding support. I know this is not allowed from policies and might not apply to everything, but as an example we have at least the hardware encoding support in Steam In-Home streaming in RPMFusiont that could leverage this.
For example, one user could also create separate packages requiring the functionality:
<pre>
$ rpm -q --requires steam | egrep cuda
xorg-x11-drv-cuda-libs(x86-32)
</pre>
Basically Steam needs the 32 bit variant of <code>libnvidia-encode.so.1</code> (NVENC) on the system before you can use the accelerated hardware encoding. At the moment, this can't be done in RPMFusion as you would need to install the full 32 bit Nvidia driver package even on a 64 bit system.
Also, this way we could also ship things "built" with NVENC support and not forcing anyone into having the full blown driver installed but just the libraries, pretty much like every other codec/format combination.
<pre>
$ rpm -q --requires ffmpeg-libs.i686 | egrep "cuda|cuvid"
libcuda.so.1()
libnvcuvid.so.1()
</pre>
Again, FFmpeg libs (32 bit) will pull in just the libraries required and not the full blown driver.
<pre>
$ rpm -q --requires gstreamer1-plugins-bad-nvenc | grep nvidia
libnvidia-encode.so.1()(64bit)
</pre>
The <code>libnvcuvid.so.1</code> library comes from the CUDA libraries part of the Nvidia driver.
This is also helpful if you want to compile stuff for CUDA without having an Nvidia card on the system. For example, you install CUDA on an Intel-only system with the CUDA libraries from the driver and can build CUDA kernels built for Blender. This way on an Intel system you can build CUDA support for running Blender on an Nvidia powered system.
Blender in particular DLopens <code>libcuda</code> and <code>libnvrtc</code> if available. The CUDA kernels in that subpackage are actually only needed to be installed with CUDA, so you can create a subpackage with the hard dependency:
<pre>
$ rpm -q --requires blender-cuda | grep cuda
cuda-nvrtc(x86-64)
xog-x11-drv-cuda-libs(x86-64)
</pre>
{{admon/note|Blender packages in Fedora|I've updated Blender in Fedora, CUDA can be enabled without problems just by adding the <code>blender-cuda</code> package along with the official Blender packages in Fedora. .}}
== Preloading nvidia-uvm ==
<code>nvidia-uvm</code> is required for CUDA stuff. This needs to load at boot if you want CUDA support, so it actually gets pre-loaded when installing the CUDA subpackage (again, this is something that happens only when you want CUDA support, so the above thinking for the libraries applies).
There's a caveat though, you can't just preload it hardly in the modprobe configuration. You need to post load it if the CUDA support is wanted or it generates problems with the rebuilding of the <code>initird</code>.
<pre>
$ cat /usr/lib/modprobe.d/nvidia-uvm.conf
# Make a soft dependency for nvidia-uvm as adding the module loading to
# /usr/lib/modules-load.d/nvidia-uvm.conf for systemd consumption, makes the
# configuration file to be added to the initrd but not the module, throwing an
# error on plymouth about not being able to find the module.
# Ref: /usr/lib/dracut/modules.d/00systemd/module-setup.sh
# Even adding the module is not the correct thing, as we don't want it to be
# included in the initrd, so use this configuration file to specify the
# dependency.
softdep nvidia post: nvidia-uvm
</pre>
There's also the <code>nvidia-modprobe</code> command in the Nvidia drivers that does the same, but that's a SETUID binary that just forcefully loads the module in the running kernel and sets the required permissions for the user. By using the above snippet, we can avoid that.
== DKMS kernel modules ==
Actually just providing this was requested by Hans. I guess we can have both (''akmods'' & ''DKMS'') in RPMFusion. DKMS is used by most people on RHEL; it's the default from the ZFS project, by DELL with updated drivers and by other vendors, including Nvidia in the default makeself archive. I think it would be good to have, even if it would not be the recommended and advertised solution.
Again, regenerating the tarballs as in previous points or just using the <code>kmodsrc</code> subpackage, we can enable this "variant" very easily.
I also have access to the upstream DKMS repository, in case we need to fix/change something quickly.
== Kernel tool ==
Regarding binary kernel modules (''kmods'') we need to adjust a couple of things (this was also another set of emails between Hans and me):
* I ship a <code>kmodtool</code> for generating ''kABI kmods'' in RHEL packages.
* A very old <code>kmodtool</code> is in Fedora shipped inside redhat-rpm-config: http://pkgs.fedoraproject.org/cgit/rpms/redhat-rpm-config.git/tree/kmodtool
* RPMFusion ship a different version of <code>kmodtool</code> in the <code>kmodtool</code> package.
What we could do now, is remove the first two <code>kmodtool</code> copies entirely, move the RPMFusion <code>kmodtool</code> to Fedora, and update it to generate also RHEL ''kABI'' modules.
This way we will have one single <code>kmodtool</code>, which can then be used for ''akmods'' and ''kmods'' for Fedora and ''kABI kmods'' for RHEL, all in the split RPMFusion repository.
== Obsolete stuff ==
We can probably remove all the GRUB 1 stuff, all the Group tags, etc. For RHEL, upgrades are not supported and there will be no RHEL 5 support soon, for Fedora, I doubt anyone has still grub 1 as their bootloader. Removing RHEL 5 support also means removing ''libnvidia-wfb'', old tls libraries, etc.
This also ties with the source tarball point above.
== Default SLI enablement ==
We can enable SLI in the new OutputClass configuration, I've discovered that it just works if you put it in the config, including <code>modeset=1<code> in <code>nvidia-drm</code>. In case of non-SLI systems, you just get a line in <code>Xorg.log</code>/journal saying that the system does not contain multiple cards.
https://github.com/negativo17/nvidia-driver/commit/64c48422115f26bef904d280a8c1bcfd836536aa
I have a SLI system at home to test if needed.
== RPM filters ==
Now that all the <code>libnvidia*</code> and GL libraries are no longer included in the RPMFusion packages, we can probably have simpler filters for the RPM libraries, just basically filter out <code>%{_libdir}/nvidia</code> and this will filter out OpenCL and anything that's left. Then, all the eventual packages requiring Nvidia libraries can just use the automatic provider mechanisms of RPM.
== <code>libvdpau</code> update in EPEL ==
We actually need to update <code>libvdpau</code> in EPEL to support the additional decoding options provided by any of the current drivers. If there is a an API/ABI discrepancy we can also rebuild the additional packages depending on it.
= Stuff for later... =
== Non GL enabled installations of the driver ==
This is something I have been facing with the CUDA installations at the University I'm helping and with quite a few requests. This is addressed by the Nvidia makeself installation parameter through <code>--no-opengl-libs</code>.
This targets:
* People with Intel GPU systems and Tesla/GeForce GPUs in the system just for calculation.
* Tesla clusters without display at all.
The installer will basically install all the driver components without all the GL stuff, the GLX module, the X config, etc.
This actually needs to be done at the package level, as the current package pulls in ''X.org'' and lot of other libraries that should not be installed in a terminal-only system.
I haven't tackled this yet due to time constraints, but I guess that for this we could simply generate different <code>xorg-x11-drv-nvidia</code> and <code>xorg-x11-drv-nvidia-libs</code> subpackages (with a different name of course) that conflict with the base ones without all the non-needed stuff in it. This was also one of the reasons why I did not choose a base package name that would start with <code>xorg-x11-drv</code>, but we can work around it.
== Hardening of persistence daemon ==
Add the additional information in the Systemd unit file, for example:
http://git.scrit.ch/srpm/python-onionbalance/tree/SOURCES/onionbalance.service#n25

Latest revision as of 08:41, 6 May 2020