Bisecting Fedora kernel

This post shows how to bisect a Fedora kernel to find the source of a regression. I needed that recently and I found no good guide, so I’m at least capturing my notes here, perhaps you find it useful. This approach can be used to identify which exact commit caused a bad kernel behavior on your hardware, and then report it to kernel maintainers. Note, you need to have a reliable way of reproducing the problem. If it happens randomly and infrequently, it’s much harder to debug.

0. Try the latest Rawhide kernel

Before you spend too much time on this, it’s always worth a shot to test the latest Rawhide kernel. Perhaps the bug is fixed already?

Usually the kernel consists of these installed packages: kernel, kernel-core, kernel-modules, kernel-modules-core, kernel-modules-extra. But see what you have installed on your system, e.g. with: rpm -qa | grep ^kernel | sort .

Install the latest Rawhide kernel:

sudo dnf update --setopt=installonly_limit=0 --repo fedora --releasever rawhide kernel{,-core,-modules,-modules-core,-modules-extra}

You want to use --setopt=installonly_limit=0 throughout this exercise to make sure you don’t accidentally remove a working kernel from your system and don’t end up with just broken ones (there’s a limit of three kernels installed at the same time by default). But it means you’ll need to remove tested kernels manually from time to time, otherwise you run out of space in /boot.

Reboot and keep pressing F8 during startup to display the GRUB boot menu. Make sure to select the newly installed kernel, boot it, test it. Note down whether it’s good or bad. If the problem is still there, we’ll need to continue debugging.

Note: When you want to remove that tested kernel, obviously you can’t be currently running from it. Then use standard dnf remove to get rid of it, or use dnf history for a more convenient way (e.g. dnf history undo last).

I. Narrow down the issue in Fedora-packaged kernels

As the first step, it’s useful to figure out which Fedora-packaged kernel is the last one with good behavior (a “good kernel“), and which one is the first one with bad behavior (a “bad kernel“). That will help you narrow down the scope. It’s much faster to download and install already built kernels than to compile your own (which we’ll do later).

Most probably you’re currently running a bad kernel (because you’re reading this). So reboot, display the GRUB boot menu and boot an older kernel. See if it’s good or bad, note it down. Unless the problem is very recent, all available kernels (usually three) in the GRUB menu will be bad. It’s time to start downloading older kernels from Koji. Use a reasonable strategy, e.g. install a month old kernel, or several months old, and gradually halve the intervals and narrow down until you find the latest good kernel. You don’t need to worry about using kernels from other Fedora releases (as you can see in their .fcNN suffix), they are standalone and work in any release. You can download the kernel subpackages manually, or use koji command (from the koji package), e.g.:

koji download-build --arch x86_64 kernel-6.5.0-0.rc6.43.fc39

That downloads many more subpackages than you need, so install just those needed (see the previous section), e.g. like this:

sudo dnf --setopt=installonly_limit=0 install ./kernel{,-core,-modules,-modules-core,-modules-extra}-6.5*.rpm

For each picked kernel, install it, boot into it, test it, note down whether it’s good or bad. Continue until you’ve found the latest good packaged kernel and the first bad packaged kernel.

II. Find git commits used for building identified good and bad kernels

Now that you have the closest good and bad packaged kernel, we need to figure out which git commits from the upstream Linux kernel were used to build them. In some cases, the git commit hash is included directly in the RPM filename. For example in my case, I reported that kernel-6.4.0-0.rc0.20230427git6e98b09da931.5.fc39 is the last good kernel, and kernel-6.4.0-0.rc0.20230428git33afd4b76393.7.fc39 is the first bad kernel. From those filenames, you can see that git commit 6e98b09da931 is good and git commit 33afd4b76393 is bad.

Not always is the commit hash part of the filename, e.g. with the example of kernel-6.5.0-0.rc6.43.fc39. In this case, you need to download the .src.rpm file from that build. Either manually from Koji, or using:

koji download-build --arch src kernel-6.5.0-0.rc6.43.fc39

Unpack that .src.rpm (my favorite decompress tool is deco), find linux-*.tar.xz archive and run the following command (adjust the archive filename):

$ xzcat -qq linux-6.5-rc6.tar.xz | git get-tar-commit-id
2ccdd1b13c591d306f0401d98dedc4bdcd02b421

(This command is documented in the kernel.spec file, also in that directory). Now you know the git commit hash used for that kernel build. Figure out commits for both the good and bad kernel you identified.

III. Use git bisect to find the exact commit that broke it

It’s time to clone the upstream Linux kernel repo:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/src/linux

And also the Fedora distgit kernel repo:

fedpkg clone -a ~/distgit/kernel

We’ll now use git bisect to arrive at the breaking commit which caused the problem. After each step, we’ll need to build the kernel, test it, and mark it as good or bad. Let’s start:

cd ~/src/linux
git bisect start
git bisect good YOUR_GOOD_COMMIT
git bisect bad YOUR_BAD_COMMIT

Git now prints a commit hash to be tested (and switches the repository to that commit), and an estimate of how many steps remain. We now need to take the current contents of the source code and build our own kernel.

Note: When building the kernel, I was advised to avoid the overhead of packaging, to speed up the process. I’m sure it’s a good advice, but I didn’t find a good guide on how to do that (including how to retrieve the Fedora kernel config, build the kernel manually, copy it to the right places, create initramfs, create a boot option in GRUB, etc). So I just ran the whole process including packaging. On my machine, the compilation took about 40 minutes and packaging took 10 minutes, and I needed to do about 11 rounds, so it was an OK tradeoff for me. (If you can write a guide how to do that without packaging, please do and link it in the comments, I’d love to read it).

Let’s create a tarball of the current source code like this:

git archive --prefix=linux-local/ HEAD | xz -0 -T0 > linux-local.tar.xz

Usually the tarballs have a version number in both the filename and the included directory (which is then also matched in a spec file). You can do that if you wish, I didn’t want to spend too much time on throwaway builds, so I just used a static filename and overwrote it each time.

Let’s move the tarball to the distgit repo:

mv ~/src/linux/linux-local.tar.xz ~/distgit/kernel/

Now we need to adjust the distgit spec file a bit:

cd ~/distgit/kernel
# edit kernel.spec

I made the following changes to the spec file:

-# define buildid .local
+%define buildid .local
-%define specrpmversion 6.4.9
+%define specrpmversion 6.4.0
-%define specversion 6.4.9
+%define specversion 6.4.0
-%define tarfile_release 6.4.9
+%define tarfile_release local
-%define specrelease 200%{?buildid}%{?dist}
+%define specrelease 0.gitYOUR_TESTED_COMMIT%{?buildid}%{?dist}

Now we can start the build:

nice fedpkg mockbuild --with baseonly --with vanilla --without debuginfo

Options --with baseonly and --without debuginfo make sure we don’t build unnecessary stuff. --with vanilla was needed, because Fedora-specific patches didn’t apply to the older source code.

After a long time, your results should be available in results_kernel/ and look something like this:

$ ls -1 results_kernel/6.4.0/0.git6e98b09da931.local.fc38/
build.log
hw_info.log
installed_pkgs.log
kernel-6.4.0-0.git6e98b09da931.local.fc38.src.rpm
kernel-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-core-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-devel-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-devel-matched-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-core-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-extra-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-internal-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-uki-virt-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
root.log
state.log

See that all the RPMs have the git commit hash identifier that you specified in the spec file. Now you just need to install the kernel (see in a previous section), boot it (make sure to display the GRUB menu and verify that the correct kernel is selected), and test it.

Note: If you have Secure Boot enabled, you’ll need to disable it in order to boot your own kernel (or figure out how to sign it yourself). Don’t forget to re-enable it once this is all over.

Once you’ve determined whether this kernel is good or bad, tell it to git bisect:

cd /src/linux
git bisect good   # or bad

And now the whole cycle repeats. Create a new archive using git archive, move it to the distgit directory, adjust the specrelease field in kernel.spec to match the new commit hash, and use fedpkg to build another kernel. Eventually, git bisect will print out the exact commit that caused the problem.

IV. Report your findings

Report the problem and the identified breaking commit into Red Hat Bugzilla under the kernel component. Please also save and attach the bisect log:

cd /src/linux
git bisect log > git-bisect-log.txt

Then also report this problem (possibly a regression) to the kernel upstream and mention it in the RH Bugzilla ticket. Thanks and good luck.

New Fedora package: ntfs-3g-system-compression

If you have a Windows 10 installation, you might not be able to read all files on its NTFS partition. Under certain conditions, Microsoft compresses system files with new compression algorithms which the ntfs-3g driver can’t currently read. Files are displayed with question marks when listed using ls, and you’ll see Input/output error or unsupported reparse point when trying to access these files. Here’s an example:

$ ls -l Windows
...
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33  ModemLogs
-????????? ? ?      ?             ?            ?  notepad.exe
drwxrwxrwx 1 kparal kparal        0 Dec 14 21:57  OCR
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33 'Offline Web Pages'
drwxrwxrwx 1 kparal kparal    16384 Dec 14 14:17  Panther
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33  Performance
-rwxrwxrwx 1 kparal kparal   984966 Feb 13 23:15  PFRO.log
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33  PLA
drwxrwxrwx 1 kparal kparal    49152 Dec 14 21:59  PolicyDefinitions
drwxrwxrwx 1 kparal kparal   163840 Feb 14 22:41  Prefetch
drwxrwxrwx 1 kparal kparal     4096 Dec 14 14:15  PrintDialog
-????????? ? ?      ?             ?            ?  Professional.xml
drwxrwxrwx 1 kparal kparal     4096 Sep 15 09:33  Provisioning
-????????? ? ?      ?             ?            ?  regedit.exe
drwxrwxrwx 1 kparal kparal        0 Dec 14 22:09  Registration
drwxrwxrwx 1 kparal kparal        0 Sep 15 11:11  RemotePackages
...

$ ls -l Windows/notepad.exe
ls: cannot access 'Windows/notepad.exe': Input/output error

$ cp Windows/notepad.exe .
cp: cannot stat 'Windows/notepad.exe': Input/output error

$ stat Windows/notepad.exe
File: Windows/notepad.exe -> unsupported reparse point
Size: 25            Blocks: 0          IO Block: 4096   symbolic link
Device: 803h/2051d    Inode: 247077      Links: 3
Access: (0777/lrwxrwxrwx)  Uid: ( 1000/  kparal)   Gid: ( 1000/  kparal)
Access: 2019-02-14 22:40:13.270993900 +0100
Modify: 2018-09-15 09:28:56.687095900 +0200
Change: 2018-12-14 21:52:10.685553700 +0100
Birth: -

Fortunately, there’s a ntfs-3g-system-compression plugin that allows you to read those files:

$ ls -l Windows/notepad.exe 
-r-xr-xr-x 3 kparal kparal 254464 Sep 15 09:28 Windows/notepad.exe

The new package is now proposed as an update in Bodhi, but in a week or so you should be able to install it with a simple:

$ sudo dnf install ntfs-3g-system-compression

Enjoy.

Whitelisting rpmlint errors in Taskotron/Bodhi

If you submit a new Fedora update into Bodhi, you’ll see an Automated Tests tab on that update page (an example), and one of the test results (once it’s done) will be from rpmlint. If you click on it, you’ll get a full log with rpmlint output.

If you wish to whitelist some errors which are not relevant for your package or are clearly a mistake (like spelling issues, etc), it is now possible. The steps how to do this are described at:

https://fedoraproject.org/wiki/Taskotron/Tasks/dist.rpmlint#Whitelisting_errors

This has been often requested, so hopefully this will help you have the automated tests results all in green, instead of being bothered by invalid errors. If something doesn’t work, and it seems to be our bug in how we execute rpmlint (instead of a bug in rpmlint itself), please file a bug in task-rpmlint or contact us (qa-devel mailing list, #fedora-qa IRC channel on Freenode).

glxosd and voglperf now available for Fedora in COPR

For all our gaming enthusiasts, I packaged glxosd and voglperf for Fedora and you can find them in my COPR repositories: glxosd COPR and voglperf COPR.

These tools allow you to have FRAPS-like features on Linux, i.e. show an overlay in OpenGL games/apps to display current FPS, and also capture the frame times into a file and plot them to a graph later. So you can now use it with any Linux game and fine-tune its graphics settings to match your preferred performance. Or you can see when your CPU or GPU is overheating. Or you can contribute to Open Game Benchmarks. Or something else.

This is an example of the glxosd overlay in action (don’t worry, its output is configurable):

glxosd-chivalry.png
glxosd overlay

And if you want, you can later plot the performance into such pretty graphs using this awesome glxosd analyser web page:

glxosdGraph-fps.png
fps graph

glxosdGraph-frametimes.png
frame times graph

And this is an example of the voglperf overlay (top left corner):

voglperf-xcom.png
voglperf overlay

And a generated graph:

voglperf-frametimes.png
frame times graph

There are other similar tools which you can use, but I know about any that is generic and has all these features. There is of course the Steam FPS overlay, but you can only use it for Steam games, and it can’t log frame information. There’s also GALLIUM_HUD, but that’s only available for Gallium-enabled drivers (radeon, nouveau) and also can’t log frame information. These two new tools should work with any driver and can be used for any game/app.

You can find installation instructions in the linked COPR repos. I do not intend to move these packages to official Fedora repos, but if somebody is willing to get their hands dirty and work on that, great, please contact me and I’ll try to help.

Enjoy!

Flattr this

New package in Fedora: sendKindle

sendKindle allows you to easily send documents to your Amazon Kindle device using a command line. You no longer need to open an email client, create a new email, fill in the recipient and a subject, add attachments, hit send, no. You just write sendKindle into your terminal, drag and drop the file, hit Enter. It’s faster 🙂

I already blogged about sendKindle before. It will use your email account (I tested just GMail) to send the file to your Amazon address. (As a bonus, I have a filter defined in GMail which will move these emails from the Sent mail to Trash, because I don’t want all the files to clutter my mailbox, and it works great.)

Recently I finally became a packager (hooray!) and pushed sendKindle as my first package into Fedora. It’s currently in updates-testing, so until it receives some karma or a week passes, you can install it like this:

$ yum install sendKindle --enablerepo=updates-testing

In a week you can use your favorite package manager without any further “complications”, because it will have landed in stable updates for Fedora 17 and 18.

The project lives at github, report all your problems there (except packaging bugs, which go to bugzilla). Be sure to see the README though – if you want new features, you need to provide patches.

Enjoy.

rpmguard: a wiki page and a new output format

rpmguard is a tool for checking differences between RPM packages. From the last time I blogged about it there were some notable changes I would like to mention:

  1. There is a wiki page serving as a home page for this tool. Please visit:
    https://fedorahosted.org/autoqa/wiki/rpmguard
    The most important information (how to check out the code, how to report a bug) is there. Also some documentation starts to appear there, like the description of all of the available checks.
  2. rpmguard has a new output format that is more similar to rpmlint or lintian and should be easier to read and parse. Example output here (artificial, usually there is no ouput or just several lines):
    $ rpmguard package-1.rpm package-2.rpm
    W: requirement-added fooreq2
    W: requirement-added rpmlib(VersionedDependencies) <= 3.0.3-1
    W: requirement-removed fooreq1
    W: requirement-version-lowered fooreq3 = 0.3.4 -> >= 0.2.7
    W: provision-added fooprov1 = 0.1.0
    W: conflict-added fooconf >= 1.0
    W: obsolescence-removed fooobs
    W: config-file-added /etc/conf2
    W: config-file-changed /etc/conf1
    W: file-mode-changed /usr/share/justfile1 0644 -> 04744
    W: doc-files-count-reduced 2 -> 1
    W: executable-added /usr/bin/bin1
    N: 12 warnings

rpmguard is soon to be plugged into our AutoQA framework to provide package update monitoring and checking. Stay tuned, comments welcome.

rpmguard – print important differences between RPMs

Package maintainers, listen up! 🙂

I have created a simple tool called rpmguard for checking differences between RPM packages. It is very similar to rpmdiff, but it prints only important changes, not all. Therefore it can be used every time a new package is built to easily see if something hasn’t went completely wrong.

So what can it do?

Currently rpmguard reports:

  • new or removed Requires/Provides/Obsoletes/Conflicts
  • lowering the version of Requires/Provides/Obsoletes/Conflicts
  • new, removed or changed config file
  • new or removed executable
  • reduced number of documentation files
  • changed user/group ownership
  • changed file mode permissions

All the above-mentioned changed are considered important enough for the maintainer to have at least a quick look at them.

Let’s see it in action

Following packages must be installed:

  • rpm
  • rpm-python
  • rpmlint (rpmdiff version 0.91 contains serious bugs, please use newer or from trunk – it’s important)

Then you run the tool simply by:

$ ./rpmguard.py package-1.rpm package-2.rpm

Example output (artificial, usually there is no ouput or just several lines):

added        REQUIRES fooreq2
added        REQUIRES rpmlib(VersionedDependencies) <= 3.0.3-1
removed      REQUIRES fooreq1
lowered      REQUIRES('= 0.3.4' -> '>= 0.2.7') fooreq3
added        PROVIDES fooprov1 = 0.1.0
added        CONFLICTS fooconf >= 1.0
removed      OBSOLETES fooobs
added        CONFIG /etc/conf2
changed      CONFIG /etc/conf1
changed      MODE(0644 -> 04744) /usr/share/justfile1
reduced      DOCS(2 -> 1)
added        EXECUTABLE /usr/bin/bin1

And now a more real-world example:

$ ./rpmguard.py kernel-2.6.31-0.86.rc3.git5.fc12.x86_64.rpm kernel-2.6.31.1-56.fc12.x86_64.rpm
added        REQUIRES rpmlib(PayloadIsXz) <= 5.2-1
added        REQUIRES dracut >= 001-7
added        REQUIRES grubby >= 7.0.4-1
removed      REQUIRES mkinitrd >= 6.0.61-1

Cool, where to get it?

rpmguard is currently part of AutoQA framework, which will be used for performing various checks on Fedora packages. You can download just the rpmguard from here:

http://git.fedorahosted.org/git/autoqa.git?a=tree;f=tests/rpmguard

or rather download the whole AutoQA:

git clone git://git.fedorahosted.org/git/autoqa.git

and look into autoqa/tests/rpmguard.

Feedback welcome

Any feedback is really welcome. If you have any ideas:

  • which other changes in RPMs should be reported
  • which changes should not be reported
  • how to adjust the program output
  • what else to improve
  • any other comments

please let me know under this blog or in the autoqa-devel mailing list. Thanks!

zsync – transfer large files efficiently

downloadA few days ago I have stumbled upon a zsync tool used for a fast transfer of very large files. The reason I have noticed it is because Ubuntu started to use it for its daily live images. And because I am curious, I have studied it and realized that zsync is great! 🙂 And I have also created some tests to see how well it works.

So what is the zsync?

zsync is a file transfer program. It allows you to download a file from a remote server, where you have a copy of an older version of the file on your computer already. zsync downloads only the new parts of the file. It uses the same algorithm as rsync. However, where rsync is designed for synchronising data from one computer to another within an organisation, zsync is designed for file distribution, with one file on a server to be distributed to thousands of downloaders. zsync requires no special server software — just a web server to host the files — and imposes no extra load on the server, making it ideal for large scale file distribution.

How does it work?

You simply generate a small .zsync file on the server for each big file you offer users to download. This .zsync file contains description of the contents of the big file. The user can then use the “zsync” tool with your .zsync file as an argument and use arbitrary file as a base for the new big file. It can be really any file, a previous version of the big file, a pre-pre-pre-previous version of the big file, or even a newer version of it! But the more it is similar the better. The zsync tool will then compare the files and download only the differences needed to assemble the new big file.

Usages

zsync is a great tool for developers who regularly download updated big files like daily CD/DVD images and similar stuff. It can really save you a lot of time and bandwidth it the files haven’t changed a lot. Especially if you have a slow internet line or the server has a slow line you might appreciate it. So what’re you waiting for?

Pitfalls

Oh, I haven’t told you yet. zsync is not currently available in Fedora because of some license problems 😦 But you can always find RPMs elsewhere.

Testing the efficiency

Here are the tests I have performed to see how well it works.

Ubuntu Karmic development images:

base image resulting image download size saved bandwidth
20090830 (703 MB) 20090831 (687 MB) 57 MB 91.7%
20090831 (687 MB) 20090901 (687 MB) 1 MB 99.9%
20090830 (703 MB) 20090901 (687 MB) 57 MB 91.7%
20090901 (687 MB) 20090830 (703 MB) 73 MB 89.6%

As you can see, zsync works great for Ubuntu images. When updating very recent images, you can save about 90% of bandwidth (and time). Unfortunately I haven’t had any older images to try to update from not-so-recent image. But notice that with zsync you can also go “back in time” from a more recent image to an older one (the last line).

Fedora 12 Alpha images:

base image resulting image download size saved bandwidth
RC1 (701 MB) RC2 (705 MB) 241 MB 65.8%
RC2 (705 MB) Final (705 MB) <1 MB 100.0%
RC1 (701 MB) Final (705 MB) 241 MB 65.8%
Final (705 MB) RC1 (701 MB) 237 MB 66.2%

For Fedora 12 Alpha images you can still save more than half when updating from RC1 to RC2 and you will be extremely pleased when updating from RC2 to Alpha Final. Although those two files are not the same, they are nearly the same (probably just some system label has changed) and you will get the image instantly.

Fedora 12 nightly composes:

base image resulting image download size saved bandwidth
20090809 (703 MB) 20090818 (702 MB) 287 MB 59.1%
20090818 (702 MB) 20090824 (630 MB) 404 MB 35.8%
20090824 (630 MB) 20090825 (636 MB) 361 MB 43.3%
20090825 (636 MB) 20090827 (640 MB) 340 MB 46.9%
20090809 (703 MB) 20090827 (640 MB) 423 MB 33.9%
20090827 (640 MB) 20090809 (703 MB) 486 MB 30.8%

For Fedora 12 nightly composes the savings number is apparently not so great as for Ubuntu images. Quite interestingly some long term updates may be more efficient than some short term updates – it depends how much the contents have changed over time. You still get around 45% of saved bandwidth, which is very nice. Why the Fedora images differ so much as opposed to Ubuntu images? You can read the Oxf13’s explanation on Fedora QA IRC meeting – the whole ISO is a squashfs image, that is not suitable for this kind of difference comparison. Ubuntu is probably using other method, which is more compatible with zsync algorithm.

Conclusion

So what do you think? Should there be .zsync files for ISO images in Fedora? I hope this article will help the infrastructure team to decide 🙂