Bisecting Fedora kernel

This post shows how to bisect a Fedora kernel to find the source of a regression. I needed that recently and I found no good guide, so I’m at least capturing my notes here, perhaps you find it useful. This approach can be used to identify which exact commit caused a bad kernel behavior on your hardware, and then report it to kernel maintainers. Note, you need to have a reliable way of reproducing the problem. If it happens randomly and infrequently, it’s much harder to debug.

0. Try the latest Rawhide kernel

Before you spend too much time on this, it’s always worth a shot to test the latest Rawhide kernel. Perhaps the bug is fixed already?

Usually the kernel consists of these installed packages: kernel, kernel-core, kernel-modules, kernel-modules-core, kernel-modules-extra. But see what you have installed on your system, e.g. with: rpm -qa | grep ^kernel | sort .

Install the latest Rawhide kernel:

sudo dnf update --setopt=installonly_limit=0 --repo fedora --releasever rawhide kernel{,-core,-modules,-modules-core,-modules-extra}

You want to use --setopt=installonly_limit=0 throughout this exercise to make sure you don’t accidentally remove a working kernel from your system and don’t end up with just broken ones (there’s a limit of three kernels installed at the same time by default). But it means you’ll need to remove tested kernels manually from time to time, otherwise you run out of space in /boot.

Reboot and keep pressing F8 during startup to display the GRUB boot menu. Make sure to select the newly installed kernel, boot it, test it. Note down whether it’s good or bad. If the problem is still there, we’ll need to continue debugging.

Note: When you want to remove that tested kernel, obviously you can’t be currently running from it. Then use standard dnf remove to get rid of it, or use dnf history for a more convenient way (e.g. dnf history undo last).

I. Narrow down the issue in Fedora-packaged kernels

As the first step, it’s useful to figure out which Fedora-packaged kernel is the last one with good behavior (a “good kernel“), and which one is the first one with bad behavior (a “bad kernel“). That will help you narrow down the scope. It’s much faster to download and install already built kernels than to compile your own (which we’ll do later).

Most probably you’re currently running a bad kernel (because you’re reading this). So reboot, display the GRUB boot menu and boot an older kernel. See if it’s good or bad, note it down. Unless the problem is very recent, all available kernels (usually three) in the GRUB menu will be bad. It’s time to start downloading older kernels from Koji. Use a reasonable strategy, e.g. install a month old kernel, or several months old, and gradually halve the intervals and narrow down until you find the latest good kernel. You don’t need to worry about using kernels from other Fedora releases (as you can see in their .fcNN suffix), they are standalone and work in any release. You can download the kernel subpackages manually, or use koji command (from the koji package), e.g.:

koji download-build --arch x86_64 kernel-6.5.0-0.rc6.43.fc39

That downloads many more subpackages than you need, so install just those needed (see the previous section), e.g. like this:

sudo dnf --setopt=installonly_limit=0 install ./kernel{,-core,-modules,-modules-core,-modules-extra}-6.5*.rpm

For each picked kernel, install it, boot into it, test it, note down whether it’s good or bad. Continue until you’ve found the latest good packaged kernel and the first bad packaged kernel.

II. Find git commits used for building identified good and bad kernels

Now that you have the closest good and bad packaged kernel, we need to figure out which git commits from the upstream Linux kernel were used to build them. In some cases, the git commit hash is included directly in the RPM filename. For example in my case, I reported that kernel-6.4.0-0.rc0.20230427git6e98b09da931.5.fc39 is the last good kernel, and kernel-6.4.0-0.rc0.20230428git33afd4b76393.7.fc39 is the first bad kernel. From those filenames, you can see that git commit 6e98b09da931 is good and git commit 33afd4b76393 is bad.

Not always is the commit hash part of the filename, e.g. with the example of kernel-6.5.0-0.rc6.43.fc39. In this case, you need to download the .src.rpm file from that build. Either manually from Koji, or using:

koji download-build --arch src kernel-6.5.0-0.rc6.43.fc39

Unpack that .src.rpm (my favorite decompress tool is deco), find linux-*.tar.xz archive and run the following command (adjust the archive filename):

$ xzcat -qq linux-6.5-rc6.tar.xz | git get-tar-commit-id
2ccdd1b13c591d306f0401d98dedc4bdcd02b421

(This command is documented in the kernel.spec file, also in that directory). Now you know the git commit hash used for that kernel build. Figure out commits for both the good and bad kernel you identified.

III. Use git bisect to find the exact commit that broke it

It’s time to clone the upstream Linux kernel repo:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/src/linux

And also the Fedora distgit kernel repo:

fedpkg clone -a ~/distgit/kernel

We’ll now use git bisect to arrive at the breaking commit which caused the problem. After each step, we’ll need to build the kernel, test it, and mark it as good or bad. Let’s start:

cd ~/src/linux
git bisect start
git bisect good YOUR_GOOD_COMMIT
git bisect bad YOUR_BAD_COMMIT

Git now prints a commit hash to be tested (and switches the repository to that commit), and an estimate of how many steps remain. We now need to take the current contents of the source code and build our own kernel.

Note: When building the kernel, I was advised to avoid the overhead of packaging, to speed up the process. I’m sure it’s a good advice, but I didn’t find a good guide on how to do that (including how to retrieve the Fedora kernel config, build the kernel manually, copy it to the right places, create initramfs, create a boot option in GRUB, etc). So I just ran the whole process including packaging. On my machine, the compilation took about 40 minutes and packaging took 10 minutes, and I needed to do about 11 rounds, so it was an OK tradeoff for me. (If you can write a guide how to do that without packaging, please do and link it in the comments, I’d love to read it).

Let’s create a tarball of the current source code like this:

git archive --prefix=linux-local/ HEAD | xz -0 -T0 > linux-local.tar.xz

Usually the tarballs have a version number in both the filename and the included directory (which is then also matched in a spec file). You can do that if you wish, I didn’t want to spend too much time on throwaway builds, so I just used a static filename and overwrote it each time.

Let’s move the tarball to the distgit repo:

mv ~/src/linux/linux-local.tar.xz ~/distgit/kernel/

Now we need to adjust the distgit spec file a bit:

cd ~/distgit/kernel
# edit kernel.spec

I made the following changes to the spec file:

-# define buildid .local
+%define buildid .local
-%define specrpmversion 6.4.9
+%define specrpmversion 6.4.0
-%define specversion 6.4.9
+%define specversion 6.4.0
-%define tarfile_release 6.4.9
+%define tarfile_release local
-%define specrelease 200%{?buildid}%{?dist}
+%define specrelease 0.gitYOUR_TESTED_COMMIT%{?buildid}%{?dist}

Now we can start the build:

nice fedpkg mockbuild --with baseonly --with vanilla --without debuginfo

Options --with baseonly and --without debuginfo make sure we don’t build unnecessary stuff. --with vanilla was needed, because Fedora-specific patches didn’t apply to the older source code.

After a long time, your results should be available in results_kernel/ and look something like this:

$ ls -1 results_kernel/6.4.0/0.git6e98b09da931.local.fc38/
build.log
hw_info.log
installed_pkgs.log
kernel-6.4.0-0.git6e98b09da931.local.fc38.src.rpm
kernel-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-core-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-devel-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-devel-matched-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-core-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-extra-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-modules-internal-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
kernel-uki-virt-6.4.0-0.git6e98b09da931.local.fc38.x86_64.rpm
root.log
state.log

See that all the RPMs have the git commit hash identifier that you specified in the spec file. Now you just need to install the kernel (see in a previous section), boot it (make sure to display the GRUB menu and verify that the correct kernel is selected), and test it.

Note: If you have Secure Boot enabled, you’ll need to disable it in order to boot your own kernel (or figure out how to sign it yourself). Don’t forget to re-enable it once this is all over.

Once you’ve determined whether this kernel is good or bad, tell it to git bisect:

cd /src/linux
git bisect good   # or bad

And now the whole cycle repeats. Create a new archive using git archive, move it to the distgit directory, adjust the specrelease field in kernel.spec to match the new commit hash, and use fedpkg to build another kernel. Eventually, git bisect will print out the exact commit that caused the problem.

IV. Report your findings

Report the problem and the identified breaking commit into Red Hat Bugzilla under the kernel component. Please also save and attach the bisect log:

cd /src/linux
git bisect log > git-bisect-log.txt

Then also report this problem (possibly a regression) to the kernel upstream and mention it in the RH Bugzilla ticket. Thanks and good luck.

Connecting to Libera.Chat through Matrix

After the last IRC changes, some of the Matrix->IRC bridges got disconnected, some rerouted (to Libera.Chat), and everything is work in progress. I’ve been a Matrix user for the past few months, and I definitely don’t want to go back to IRC. But in order to stay connected to the Fedora community, some steps were needed. Here’s a blog post to help me remember the necessary steps, in case I need it again in the future.

Note: Ideally, I wouldn’t need to interact with Libera.Chat in any way, and all important Fedora Matrix rooms would be bridged to IRC. However, that’s not the case at the moment (they are working on it). Also, some IRC rooms require registration, otherwise you can’t talk to them. It is unclear whether some solution is implemented to allow Matrix users to speak in such a room without Libera.Chat registration. So I had to give up and create a Libera.Chat account and set up services to identify me on that network. This guide includes the necessary steps. Hopefully it can be avoided in the future.

This guide will make all necessary steps from your Matrix account. No IRC client is needed.

First, join some bridged room in your Matrix client, #fedora-devel:matrix.org is a popular choice. This should create a connection to Libera.Chat as well, because of the bridge.

Second, create a discussion with a bot named @appservice:libera.chat. That’s your IRC admin room. Type !help for a list of commands.

Type !nick to see your current Libera.Chat nick. Mine was kparal[m]. I changed it to kparal using the same command:

> !nick

Format: '!nick DesiredNick' or '!nick irc.server.name DesiredNick'
Currently connected to IRC networks:
irc.libera.chat as kparal[m]

> !nick kparal

Nick changed from 'kparal[m]' to 'kparal'.

> !nick

Format: '!nick DesiredNick' or '!nick irc.server.name DesiredNick'
Currently connected to IRC networks:
irc.libera.chat as kparal

Now, type !listrooms to list all IRC rooms you’re currently connected to, including where the bridge points to. You should at least see the room you joined originally. If you are connected to a room which is not listed here, it means it is not bridged to Libera.Chat. My example:

> !listrooms

You are joined to 4 rooms:

#fedora-admin which is bridged to Fedora Infrastructure Team, !jaUhEeJGegYfphMOke:libera.chat
#fedora-workstation which is bridged to Fedora Workstation
#fedora-devel which is bridged to Fedora Devel, !OiUqPxkucYgjgQVNoR:libera.chat
#fedora-qa which is bridged to #fedora-qa

Now you have to register your username on Libera.Chat. In your Matrix client, create a discussion with a bot named @NickServ:libera.chat. That’s an account service bot. Type help to receive some basic help.

If you type info, you’ll probably receive a message that you’re not registered:

> info

kparal is not registered.

Now pick a password and your email address and register:

> register your-password your@email

An email containing nickname activation instructions has been sent to your@email.

Check your email for a verification code, then type it in (and wait, this took a few minutes in my case):

> verify register your-nick verification-code

your-nick has now been verified.

Type info, this time you should receive lots of information about your account. You can also use status or acc (the right return value should be 3):

> info

Information on kparal (account kparal):
...

> status

You are logged in as kparal.

> acc kparal

kparal ACC 3

OK, it’s now time to return back to @appservice:libera.chat and set up automatic identification (“logging in”) for Libera.Chat, any time you re-join the IRC network. Store your username and password with the appservice:

> !username your-nick

Successfully stored username for irc.libera.chat. Use !reconnect to use this username now.

> !storepass your-password

Successfully stored password for irc.libera.chat. Use !reconnect to use this password now.

Now test it by reconnecting to Libera.Chat and checking your nick and rooms:

> !reconnect

Reconnecting to network...

> !nick

Format: '!nick DesiredNick' or '!nick irc.server.name DesiredNick'
Currently connected to IRC networks:
irc.libera.chat as kparal

> !listrooms

You are joined to 4 rooms:

#fedora-admin which is bridged to Fedora Infrastructure Team, !jaUhEeJGegYfphMOke:libera.chat
#fedora-workstation which is bridged to Fedora Workstation
#fedora-devel which is bridged to Fedora Devel, !OiUqPxkucYgjgQVNoR:libera.chat
#fedora-qa which is bridged to #fedora-qa

Everything seems to be working now, hopefully.

Remember, you can join Matrix-native rooms by searching for them in your client, and check whether they’re bridged using !listrooms. If you need to join a non-bridged IRC room, you can join it by entering #room-name:libera.chat room.

I hope this helped somebody (of the future me). The user experience is likely to get improved in the future.

Show a side-by-side git diff on any commit in tig using Meld

Side-by-side diffs are more readable to me than in-line diffs. Long time ago, I started using Meld to display them when working with git. But I always needed to manually specify branch or commit names. This week I finally spent some time and found a way to invoke Meld directly from tig, so that I can see the diff side-by-side while browsing a commit history in tig (for example, when I want to review a proposed branch containing 10 new commits, and I want to inspect each of them individually). Here’s a short howto.

First, let’s configure Meld as your git difftool:

git config --global diff.tool meld

You can now see a diff between two branches/commits with:

git difftool -d branch1 branch2

That’s a lot of typing, though, so let’s create a handy alias:

git config --global alias.dt 'difftool -d'

And now you can use:

git dt branch1 branch2

And now, let’s integrate this into tig. Edit ~/.config/tig/config and add this snippet:

# use difftool to compare a commit in main/diff view with its parent
# https://github.com/jonas/tig/issues/219#issuecomment-406817763
bind main w !git difftool -d %(commit)^!
bind diff w !git difftool -d %(commit)^!

Notice I chose the “w” key as a shortcut key, because it’s unassigned by default. You can choose a different shortcut of course, see man tigrc.

Now anytime you want to see a side-by-side diff on any commit displayed in tig:

You simply press w and you’ll see the diff between the commit and its parent show up in Meld:

This improved my life a lot, perhaps it helped you as well 🙂 Cheers.

Taskotron is EOL (end of life) today

As previously announced, Taskotron (project page) will be shut down today. See the announcement and its discussion for more details and some background info.

As a result, certain tests (beginning with “dist.“) will no longer appear for new updates in Bodhi (in Automated Tests tab). Some of those tests (and even new ones) will hopefully come back in the future with the help of Fedora CI.

Thank you to everyone who contributed to Taskotron in the past or found our test reports helpful.

taskotron

Automatically shrink your VM disk images when you delete files (Fedora 32 update)

I’ve already written about this in 2017, but things got simpler since then. Time for an update!

If you use virtual machines, your disk images just grow and grow, but never shrink – deleted files inside a VM never free up the space on the host. But you can configure the VM to handle TRIM (discard) commands, and then your disk images will reflect deleted files and shrink as well. Here’s how (with Fedora 32 using qemu 4.2 and virt-manager 2.2).

Adjust VM configuration

  1. When creating a new VM, use qcow2 disk images (that’s the default), not raw.
  2. Your new VM should have VirtIO disks (that’s the default).
  3. In virt-manager in VM configuration, select your VirtIO disk, go to Advanced -> Performance, and set Discard mode: unmap.
    virt-manager-unmap

Test it

Now boot your VM and try to issue a TRIM command:

$ sudo fstrim -av
/boot: 908.5 MiB (952631296 bytes) trimmed on /dev/vda1
/: 6.8 GiB (7240171520 bytes) trimmed on /dev/mapper/fedora-root

You should see some output printed, even if it’s just 0 bytes trimmed, not an error.

Let’s see if the disk image actually shrinks. You need to list its size using du (or ls -s) to see the disk allocated size, not the apparent file size (because the disk image is sparse):

$ du -h discardtest.qcow2 
1.4G discardtest.qcow2

Now create a file inside the VM:

$ dd if=/dev/urandom of=file bs=1M count=500

We created a 500 MB file inside the VM and the disk image grew accordingly (give it a few seconds):

$ du -h discardtest.qcow2
1.9G discardtest.qcow2

Now, remove the file inside the VM and issue a TRIM:

$ rm file -f
$ sudo fstrim -av

And the disk image size should shrink back (give it a few seconds):

$ du -h discardtest.qcow2
1.4G discardtest.qcow2

If you configure your system to send TRIM in real-time (see below), it should shrink right after rm and no fstrim should be needed.

Issue TRIM automatically

With Fedora 32, fstrim.timer is automatically enabled and will trim your system once per week. You can reconfigure it to run more frequently, if you want. You can check the timer using:

$ sudo systemctl list-timers

If you want a real-time TRIM, edit /etc/fstab in the VM and add a discard mount option to the filesystem in question, like this:

UUID=6d368798-f4c2-44f9-8334-6be3c64cc449 / ext4 defaults,discard 1 1

This has some performance impact (they say), but the disk image will shrink right after a file is deleted. (Note: XFS as a root filesystem doesn’t issue TRIM commands without additional tweaking, read more here).

Stay informed about QA events

Hello, this is a reminder that you can easily stay informed about important upcoming QA events and help with testing Fedora, especially now during Fedora 32 development period.

The first obvious option for existing Fedora contributors is to subscribe to the test-announce mailing list. We announce all our QA meetings, test days, composes nominated for testing and other important information in there.

A second, not that well-known option which I want to highlight today, is to add our QA calendar to your calendar software (Google Calendar, Thunderbird, etc). You’ll see our QA meetings (including blocker review meetings) and test days right next to your personal events, so they will be hard to miss. A guide how to do that is here on our QA homepage.

Thank you everyone who joins our efforts and helps us make Fedora better.

Disabling kinetic scrolling in Firefox

In Firefox 70, there is a new feature called kinetic scrolling [1]. If you scroll the web page using trackpad (or possibly touchscreen), the scroll event will not stop immediately after releasing your fingers, but it will gradually slow down, as if a rotating wheel slowly stops. After using it for a short while, I started to hate it really quickly. The problem is that the slowdown-and-stop occurs very slowly and if you just want to scroll the webpage to continue reading, you need to wait several seconds until the page fully stops moving. That’s really annoying. Fortunately, this cool new feature can be disabled. Just open about:config page in a new tab, search for apz.gtk.kinetic_scroll.enabled and set it to false. Tada! No more kinetic scrolling.

[1] I found these related Mozilla tickets: #1213601, #1564238

New Fedora package: ntfs-3g-system-compression

If you have a Windows 10 installation, you might not be able to read all files on its NTFS partition. Under certain conditions, Microsoft compresses system files with new compression algorithms which the ntfs-3g driver can’t currently read. Files are displayed with question marks when listed using ls, and you’ll see Input/output error or unsupported reparse point when trying to access these files. Here’s an example:

$ ls -l Windows
...
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33  ModemLogs
-????????? ? ?      ?             ?            ?  notepad.exe
drwxrwxrwx 1 kparal kparal        0 Dec 14 21:57  OCR
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33 'Offline Web Pages'
drwxrwxrwx 1 kparal kparal    16384 Dec 14 14:17  Panther
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33  Performance
-rwxrwxrwx 1 kparal kparal   984966 Feb 13 23:15  PFRO.log
drwxrwxrwx 1 kparal kparal        0 Sep 15 09:33  PLA
drwxrwxrwx 1 kparal kparal    49152 Dec 14 21:59  PolicyDefinitions
drwxrwxrwx 1 kparal kparal   163840 Feb 14 22:41  Prefetch
drwxrwxrwx 1 kparal kparal     4096 Dec 14 14:15  PrintDialog
-????????? ? ?      ?             ?            ?  Professional.xml
drwxrwxrwx 1 kparal kparal     4096 Sep 15 09:33  Provisioning
-????????? ? ?      ?             ?            ?  regedit.exe
drwxrwxrwx 1 kparal kparal        0 Dec 14 22:09  Registration
drwxrwxrwx 1 kparal kparal        0 Sep 15 11:11  RemotePackages
...

$ ls -l Windows/notepad.exe
ls: cannot access 'Windows/notepad.exe': Input/output error

$ cp Windows/notepad.exe .
cp: cannot stat 'Windows/notepad.exe': Input/output error

$ stat Windows/notepad.exe
File: Windows/notepad.exe -> unsupported reparse point
Size: 25            Blocks: 0          IO Block: 4096   symbolic link
Device: 803h/2051d    Inode: 247077      Links: 3
Access: (0777/lrwxrwxrwx)  Uid: ( 1000/  kparal)   Gid: ( 1000/  kparal)
Access: 2019-02-14 22:40:13.270993900 +0100
Modify: 2018-09-15 09:28:56.687095900 +0200
Change: 2018-12-14 21:52:10.685553700 +0100
Birth: -

Fortunately, there’s a ntfs-3g-system-compression plugin that allows you to read those files:

$ ls -l Windows/notepad.exe 
-r-xr-xr-x 3 kparal kparal 254464 Sep 15 09:28 Windows/notepad.exe

The new package is now proposed as an update in Bodhi, but in a week or so you should be able to install it with a simple:

$ sudo dnf install ntfs-3g-system-compression

Enjoy.

Whitelisting rpmlint errors in Taskotron/Bodhi

If you submit a new Fedora update into Bodhi, you’ll see an Automated Tests tab on that update page (an example), and one of the test results (once it’s done) will be from rpmlint. If you click on it, you’ll get a full log with rpmlint output.

If you wish to whitelist some errors which are not relevant for your package or are clearly a mistake (like spelling issues, etc), it is now possible. The steps how to do this are described at:

https://fedoraproject.org/wiki/Taskotron/Tasks/dist.rpmlint#Whitelisting_errors

This has been often requested, so hopefully this will help you have the automated tests results all in green, instead of being bothered by invalid errors. If something doesn’t work, and it seems to be our bug in how we execute rpmlint (instead of a bug in rpmlint itself), please file a bug in task-rpmlint or contact us (qa-devel mailing list, #fedora-qa IRC channel on Freenode).