Warm Reboot on Linux with kexec (Remember QEMM?)

If you are old enough to remember QEMM from back in the ’90s, along with other tools we used to squeeze every last byte of memory under the 640KB limit, you may remember a rather cool feature it had – warm reboot.

What is a Warm Reboot?

Reboot involves the computer doing a Power-On Self Test (POST). This takes time, often as much as a few minutes on some servers and workstations. While you are setting something up and need to test frequently that things come up correctly at boot time, the POST can make progress painfully slow. If only we had something like the warm reboot feature that QEMM had back in the ’90s, which allowed us to reset the RAM and reboot DOS without rebooting the entire machine and suffer the POST time. Well, such a thing does actually exist in modern Linux.

Enter kexec

kexec allows us to do exactly this – load a new kernel, kill all processes, and hand over control to the new kernel as the bootloader does at boot time. What do we need for this magic to work? On a modern distro, not much, it is all already included. Let’s start with a script that I use and explain what each component does:

#!/bin/bash

systemctl isolate multi-user.target

rmmod nvidia_drm nvidia_modeset nvidia_uvm
rmmod nvidia

kexec --load=/boot/vmlinuz-$(uname -r) \
      --initrd=/boot/initramfs-$(uname -r).img \
      --command-line="$(cat /proc/cmdline)"

kexec --exec

Let’s look at the kexec lines first. uname -r returns the current kernel version. $(uname -r) bash syntax allows is to take the output of a command and use it as a string in the invoking command. On recent CentOS 8 here is what we get:

$ uname -r
4.18.0-193.6.3.el8_2.centos.plus.x86_64
$ echo $(uname -r)
4.18.0-193.6.3.el8_2.centos.plus.x86_64

The kernel and initial ramdisk usually have the kernel version in their names in /boot/:

$ ls /boot/
initramfs-4.18.0-193.6.3.el8_2.centos.plus.x86_64.img
vmlinuz-4.18.0-193.6.3.el8_2.centos.plus.x86_64

So in our warm reboot script, vmlinuz-$(uname -r) will expand to vmlinuz-4.18.0-193.6.3.el8_2.centos.plus.x86_64. Similar will happen with the initramfs file name.

Next, what is in /proc/cmdline ? This contains the boot parameters that our currently running kernel was booted with, as provided in our grub conifguration, for example:

$ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos2)/vmlinuz-4.18.0-193.6.3.el8_2.centos.plus.x86_64 root=ZFS=tank/ROOT quiet elevator=deadline transparent_hugepage=never

This is the minimum needed to boot the kernel. Once we have supplied this information, we initiate the shutdown and process purge, and hand over to the new kernel, using:

kexec --exec.

But what are the systemctl and rmmod lines about? They are mostly to work around finnickiness of Nvidia drivers and GPUs. If you execute kexec immediately, with the Nvidia driver still running, the GPU won’t reset properly and won’t get properly re-initialised by the driver when the kernel warm-boots. So we have to rmmod the nvidia driver. Legacy nvidia driver only includes the nvidia module. Newer versions also include nvidia_drm, nvidia_modeset and nvidia_uvm which depend on the nvidia module, so we have to remove those first. But before we do that, we have to make sure that Xorg isn’t running, otherwise we won’t be able to unload the nvidia driver. To make sure graphical environment isn’t running, we switch the runlevel target to multi-user.target (on a workstation we are probably running graphical.target by default). Once Xorg is no longer running, we can proceed with unloading the nvidia driver modules. And with that done, we can proceed with the warm boot and enjoy a reboot time saving.

How to Optimise Your Google Adwords Marketing Campaign

I recently put in place Google Adwords for my colleagues, at the MySQL Consultancy, Shattered Silicon, and there are a few things that I think anyone using Google Adwords for advertising products and services should know in order to avoid wasting any of their marketing budget. All of these things are pretty obvious in hindsight, but if you dive in with no experience, you will likely find yourself wasting a lot of your marketing budget on advertising to an audience that isn’t looking to buy.

Location, Location, Location!

Advertise only in places where the companies can plausibly afford to engage your services. In broad terms, this means that if you are based in and doing business from an OECD country, unless you operate in a niche that bucks this rule, there is a good chance that clients who aren’t in another OECD country – can’t afford you. So make sure you set your advertising campaign to specifically target the countries that are of interest to you and specifically exclude the countries that are not. Of course this is not a hard rule – there could be benefit to including additional countries, e.g. if you have staff who speak the language.

There is another aspect of this – in countries where the economy isn’t doing great, there are more likely to be people looking for work in the industry you operate in who will click your advert on the off chance that you may be hiring. So unless you are struggling to hire remote workers, this is another reason to focus on locations where those expensive clicks are likely to be bringing you visits from prospective clients, rather than prospective employees or competitors.

Which brings us onto the next point.

Exclude Job-Seeker Terms

Unless you are looking to hire, exclude commonly used job-seeking terms, such as “hire”, “junior”, “senior”, “internship”, “training”, “position”, “opening” and any others you can think of. This significantly reduced the number of impressions and clicks which were never going to turn into sales.

Avoid Ambiguous Terms

It is worth investing some time into researching whether there there is a tool with a similar name to the service you are offering. For example, if you are advertising services of a MySQL administrator, you may find that most of the clicks you get are from people looking for the deprecated tool of the same name. Similarly, either avoid your advertising of MySQL tuning services or disambiguate it from the MySQL tuning tool.

Ideally, try to avoid such term completely or at the very least use specific rather than broad matching and add exclusions for the words used in things other than what you are trying to market.

No Freebies

Presumably, if you are spending your hard earned money on marketing to grow your business, you are running a commercial operation rather than a charity. So you should probably exclude terms like “free”, “trial”, “download” and other terms typically used in searches for things that people don’t expect to pay for.

Audience Age

Another very obvious optimisation, when you think about it, is optimising by age range. If you are selling a highly technical service service aimed at business owners and CTOs, it is unlikely they will be younger than mid-20s or older than the retirement age. So exclude those age ranges from your advert targeting spec.

Use the Price Extension on your Advert

You may think that showing prices of 2-3 of your most popular products puts people off from clicking on your ad, but this is a good thing. If a price tag puts somebody off from your advert, there is a good chance they aren’t looking to buy anyway – so they might as well not cost you for the clicks.

Effect on Impressions, CPC and CTR

If you are doing it right, you will find that your total conversion cost will go down, but at first look counter-intuitively, your CPC will go up and your impression count will go down – because your competitors have already probably already gone through a similar exercise, and you are homing in on the actually valuable impressions that they are also competing for. The CPC in the case I have been working on almost doubled overnight once I applied all of the above optimisations. But the cost per conversion went down overall.

Hopefully, these thoughts will get your marketing efforts off to a smoother and cheaper start than it would have otherwise been.

HP G7 MicroServer ILO SSH Connectivity

I recently pressed a G7 MicroServer back into service, and disccovered that I couldn’t connect to it over SSH. This seemed odd given that I am quite certain I remember doing so before. A quick nmap scan showed that the ssh port was definitely open on the ILO:

Starting Nmap 6.40 ( http://nmap.org ) at 2020-06-07 23:06 BST
Nmap scan report for 192.168.0.2
Host is up (0.0086s latency).
Not shown: 992 closed ports
PORT STATE SERVICE
22/tcp open ssh
80/tcp open http
111/tcp open rpcbind
427/tcp open svrloc
443/tcp open https
2068/tcp open advocentkvm
5988/tcp open wbem-http
5989/tcp open wbem-https

Increasing verbosity on ssh connection (ssh -v), yielded some interesting insight, specifically:

debug1: match: OpenSSH_5.2 pat OpenSSH_5* compat 0x0c000000

So it could in fact be that more modern ssh tries to connect with ciphers and protocol options that the now relatively ancient OpenSSH 5.2 doesn’t quite understand. So I quickly grabbed OpenSSH 5.2 portable, built it and tried with that, and – success! Doing it again with verbosity turned up showed what ciphers and MACs were used. I added the following to my ~/.ssh/config:

Host 192.168.0.2
Ciphers aes128-ctr
MACs hmac-md5

And lo and behold, ssh-ing to the ILO from recent ssh on EL8 now works!

Hopefully this will save somebody some time in the future, or prevent them from throwing away what is still a perfectly usable microserver.

Windows 10 – the horror of upgrade to 1903+ releases

I had the misfortune the other day that one of my machines updated beyond 1809 release. There are three changes that immediately infuriated me.

1. Colour scheme change and removal of availability of black colour menus

The colour scheme changes upon upgrade. Not only is it a rather bright one going back to the ’90s style and contrary to dark themes that are easier on the eyes, laptop batteries and screen burn, but there is no way to full revert back. In order to get anywhere near black it is now necessary to disable the transparency effects, and even then it is still not black but dark grey.

Visually, this is a huge downgrade that cannot be fully mitigated or undone. And a bit of googling will show that I am far from the only one infuriated by this change.

2. File explorer grouping by default instead of continuing with previous default

As soon as you open your file folders after the upgrade, the files are now grouped by default. I can understand that this might be a new default setting, but on an upgrade, old behaviour should be preserved. If somebody chose not to view their files grouped, they should not be overriden by the upgrade and made to waste their time putting things back the way they were. Worse, this seems to revert the the new annoying default randomly.

Microsoft needs to understand that the ONLY reason why anyone uses Windows instead of a different operating system is long term inertia of familiarity. Chipping away at that familiarity will just push more users away from Windows and toward other operating systems. This is user alienation by a million paper cuts.

3. Start menu auto-expand

This one is incredibly annoying for a change so minor. There is supposedly a way to disable this using the mach2 tool, but it doesn’t seem to have worked for me – no matter what I do, I couldn’t get it to turn off and stay off.

Conclusion

So what did I do? Well, this is a VM – I refuse to run Windows on bare metal since over a decade ago, because it is too prone to getting itself into a state where various things break in a way that all the googlable solutions just don’t work and the only solution is to format and reinstall (windows update getting broken is a common show stopper). Being a VM, it doesn’t run on a real disk but a virtual one. I run my VMs on ZFS zvols, which are regularly snapshotted. So I powered off the VM, performed a zfs rollback to a pre-upgrade snapshot, and as if by magic, all of the damage done by releases after 1809 have been undone and things are back in a state where at least the new annoyances are gone. Best of all, zfs rollback took a few milliseconds instead of the lengthy rollback using Windows restore that probably wouldn’t have put things back quite to exactly the same state the machine was in before.

The upgrade will no doubt be forced on me at some point, in a way that I can’t avoid it any more. The feature upgrades can only be postponed by 365 days and disabled for another 30. But at least – today is not that day.

Plea to Microsoft

Please, for the love of God, stop changing things in ways that nobody asked for, nobody wanted, and nobody likes. Make your designers and developers learn from the fiasco of the ribbon interface.

Hardware Accelerated SSL on ARM – Redux

A long time ago, I posted an article about advantages of hardware accelerated SSL encryption, and how to get it working on Fedora Linux. Since then, some things have improved, and some things have regressed.

Improvements:

Regressions:

  • RedHat have broken OpenSSH with their audit patch. This is particularly inconsistent with the fact that the distro supplied openssh package in EL6 is built with the –with-ssl-engine option, to enable support for hardware crypto acceleration, yet this is clearly completely untested, which begs the question of what the point of it is.

Thankfully, the regression mentioned above can be fixed to make sshd work properly with hardware crypto offload.

Here are links to patched OpenSSL and OpenSSH packages for EL6, current at the time of writing this article:

http://ftp.redsleeve.org/pub/el6/packages/soc/SRPMS/openssl-1.0.1e-30.el6.11.cryptodev.src.rpm

http://ftp.redsleeve.org/pub/el6/packages/soc/SRPMS/openssh-5.3p1-104.el6.1.cryptodev.src.rpm

While ssh with using the blowfish algorithm in software is very fast and good enough for general purpose ssh usage, for some operations, such as transferring ZFS snapshots over ssh, using hardware offloaded AES provides a very welcome performance boost, because it leaves more CPU available for other processes.

ZFS-FUSE 0.7.1 Released

The last official release of zfs-fuse was years ago, and it was seriously starting to fall behind other implementations. It was effectively abandoned, which is quite inconvenient considering it is still the only viable option on 32-bit Linux installations (e.g. on ARM or those who are still tied to i686 for legacy reasons).

Since I use Linux on ARM heavily, I have been working on changing this for the past few weeks. The last official release 0.7.0 was made by Seth Heeren a few years ago, and this supported ZFS pool versions up to v23. Emmanuel Anne was maintaining an unofficial post-0.7.0 branch that had support for pool versions up to v26 added. Over the past couple of years, other people have contributed a few patches here and there (manual ashift setting at boot time, some patches to add support for ARM, a couple of patches maintained out of tree shipped with the Fedora package). Over the past few weeks I need a few additional features that have existed in other implementations, particularly for running a root file system on it (mount.zfs for legacy mount points, and better systemd/initramfs), so I added those features. It also transpired that a few of the patches that made it into the official 0.7.0 release weren’t in Emmanuel’s code tree since it was forked before the official 0.7.0 release. I located and backported those from Seth’s maint branch on github.

With all this done, and with no other volunteers showing any interest in further maintaining zfs-fuse, it seems to have fallen to me to make the decision to take the 0.7.1 release. I have tested this extensively on my ARM systems with pools of various sizes (16GB to 16TB) and complexities (single disk to RAIDZ2) and it has been very stable.

If you are stuck on a 32-bit Linux platform and would love the features of ZFS, you can find the latest release of zfs-fuse on on github:

https://github.com/gordan-bobic/zfs-fuse

Future work will include adding support for additional pool versions. I have already created branches for those, but, this will need extensive testing before I deem it stable enough for a release. If you are interested in helping with either development or testing of zfs-fuse, please, do get in touch.

EVGA SR-2 – Long Term Review

Having used the EVGA’s once flagship and possibly their most hyped up ever motherboard for the past two and a half years and having fought it’s many bugs and quirks extensively over that period through many uses it was supposed to, in theory, be capable of but was clearly never tested against, it seemed like a good idea to collate all the issues and workarounds into a single article. These findings have been cross-checked against multiple SR-2 motherboards.

Hardware / BIOS / POST

While there are various minor annoying bugs in the BIOS itself, I will not go into details of those and instead focus on the issues of real practical use

96GB of RAM

Xeon X5xxx series CPU specification states that each is capable of addressing 192GB of RAM. Unfortunately, EVGA SR-2 specification only states it is capable of handling up to 48GB of RAM. This is more than a little disappointing, but there is a way to persuade it to complete the POST with 96GB with 12 8GB DIMMs. You will need 12 8GB x4 dual-ranked registered DDR3 DIMMs. Insert 6 of them into the red memory slots, and boot up. Set the following:

  • MCH strap: 1600MHz
  • Memory speed: 1333
  • Manually set all the memory timings to what they were auto-detected to be
  • Set the command rate to 2T
  • No voltage increases are required just because you have 96GB – if your DIMMs are rated at 1.35V, then there is no need to set DIMM voltages higher than 1.35V.

Insert the remaining 6 DIMMs and it should now be able to boot with 96GB. The POST may take 2-3 cycles to complete, but within 30 seconds or so you should see the BIOS splash screen. Once it has booted up, a soft reboot will complete without delay. It only takes a little while on a cold boot.

Don’t expect 96GB to POST at much over 167MHz BCLK.

Unfortunately, more than 96GB will not work.

Watch out for SpeedStep Side Effects

If you enable SpeedStep but disable TurboBoost, the CPU will still boost to +1 multiplier. This is not intuitive and can cause you problems during stability testing.

Clock Generator Stability

Above 180MHz BCLK, expect to see very noisy clock signals. If you watch the clock speeds on a monitoring application, you will notice that the clock speeds will regularly spike very high and very low. This means that the stability above 180MHz BCLK is not going to be appropriate for any serious use.

Virtualization With VT-d / IOMMU

All the PCIe slots on the SR-2 are behind Nvidia NF200 PCIe bridges. Unfortunately, these have a bug in that they do not route all DMA via upstream root PCIe hub. The consequence is that when a virtual machine with PCI passthrough tries to access memory at physical range within it’s virtual sandbox that overlaps with the physical range of a PCI IOMEM area mapped to any physical device, this will be routed to the physical device rather than remapped out of the way. When this happens, at best it will result in a host crash when a physical card crashes and takes the PCIe bus down with it. At worst, the memory access will trample the region mapped to a disk controller which can easily result in garbage being written to disk – and then the host will crash anyway.

To workaround is to make sure by whatever means are available that the virtual machine does not access the area between 1GB and 4GB, which is the area reserved for mapping PCI I/O memory. Two years ago the only solution available to me was to write a patch for Xen’s hvmloader that marked that entire memory area as reserved. In theory you could also tell your guest OS to simply not use that memory (e.g. using bcdedit in Windows 7 and later to mark the area as badmem, or using mem= parameters to the Linux kernel). Today with the latest version of QEMU for Xen and KVM, you can instead use the max-ram-below-4g=1G parameter to the -machine option, which will achieve the same thing much more cleanly and with no ill side effects (such as 3GB of RAM going missing in the guest).

Note that even with this workaround, there will still be weird seemingly DMA related crashes on the SR-2 when you have VT-d enabled and you use SAS controllers. For some reason this motherboard really does not play well with them (tried three different generations of LSI, an Adapted and a 3Ware). Some controllers will simply have no disks show up when you boot the kernel with intel_iommu=on (older LSI, Adaptec), others will seem to work but randomly crash when a VM with PCI passthrough is running (3Ware). Simple SATA controllers do not seem to suffer from this problem.

Marvell 88SE9123 SATA-3 6 GBit controller

This may nominally be a 6GBit/s SATA controller, but you should be aware that its physical upstream connection is via a x1 PCIe 2.0 lane, with a maximum throughput of 5GBit/s. That means the maximum throughput you can possibly get from both of these SATA ports (the red ones on the board) combined is about 450-500MB/s. This is something to bear in mind if you are planning to connect a pair of SSDs. You will achieve higher overall throughput by connecting the 2nd SSD to the ICH10 SATA-2 controller (the black ports on the board), even through the latter only supports up to 3GBit/s.

Overclocking with Westmere Xeons

The settings I have used with great success for the past 2.5 years, in addition to those mentioned above required for operation with 96GB of RAM are:

  • CPU Core Voltage: 1.300V. This is sufficient for up to 4GHz. You may need to go as far as 1.350V for 4.15GHz, but beyond that no voltage increase will keep things stable.
  • VTT Voltage: 1.325V. This is sufficient up to about 3.33GHz uncore speeds, which is about as far as you can realistically expect to get out of Westmere Xeons. Do not under any circumstances push this past 1.350V as it is almost guaranteed to damage the CPU regardless of how good the cooling is.
  • BCLK: <= 180MHz. My experience is that this is as far as you can go before clock frequencies start to spike all over the place. In the interest of stability, I would recommend not exceeding 177MHz, as this is where 4.8GT/s QPI setting actually equals 6.4GT/s that all the components are rated at – and there seems to be almost no headroom at all for QPI overclocking on components of this generation.

Motherboard Heatsink Fan

As far as I have been able to establish, this only seems to make any appreciable difference in cases of combined extreme BCLK overclocking, IOH over-volting, and using most if not all of the 64 PCIe lanes available through the PCIe slots. In more typical use (two PCIe x16 GPUs, 166MHz BCLK, relatively low 1.250V on the IOH), the difference between the fan being full on (approx. 5000 rpm) and completely off is around 9C (46C fully on, 55C completely off). Consequently, it may be preferable in some cases to remove the aluminium duct plate surrounding the fan, disconnect the fan, and leave the heatsink to passively cool the Intel 5520 I/O Hub, Intel ICH10 South Bridge, and Nvidia NF200 PCIe bridges. The airflow through the case caused by the case fans is likely to be more than sufficient in most if not all installations. This will also prevent the sometimes extreme yet invisible dust build-up in the fins on this heatsink under the aluminium duct plate surrounding the fan causing the temperatures to be higher than they would be if there were no active fan or duct plate present.

Linux

Hot-plug Flapping

This will show up as soon as you start the installer for any distribution you choose. You will receive a flood of messages to the console which will make the system grind to a halt. The workaround is to add pcie_ports=compat to the list of kernel boot parameters. Unfortunately, there is a device on-board that is erroneously marked as hot-pluggable and results in ASPM causing to flap between plugged and unplugged states. Disabling ASPM in the BIOS is not sufficient to fix this.

Intel HD Audio Line Mapping

This took me a while to work out, and had me thinking I had a failed audio port. The front panel connector is using an unusual port, resulting in it not producing output, and not even emitting ACPI events when something is connected and disconnected. The solution is to produce a correct map and supply it to the driver (it turns out problems like this are so common that the snd-hda-intel driver can load such a map at startup.

Simply put this in /lib/firmware/hda-jack-retask.fw:

[codec]
 0x10ec0889 0x00000000 2

 [pincfg]
 0x11 0x411111f0
 0x12 0x59a3112e
 0x14 0x01014c10
 0x15 0x01011c12
 0x16 0x01016c11
 0x17 0x01012c14
 0x18 0x01a19c40
 0x19 0x02a19c50
 0x1a 0x01813c4f
 0x1b 0x0321403f
 0x1c 0x411111f0
 0x1d 0x4015e601
 0x1e 0x01441130
 0x1f 0x01c46160

And put this in /etc/modprobe.d/hda-jack-retask.conf

options snd-hda-intel patch=hda-jack-retask.fw,hda-jack-retask.fw,hda-jack-retask.fw,hda-jack-retask.fw

That should solve the problem.

Final Words

Unfortunately, it took many man-days over the past two years to work out all this, and work out the solutions. It is not acceptable that a high-end flagship product of the sort that the SR-2 was presented to be is so buggy and require so much troubleshooting from the end customer. While the SR-2 has it’s place in history as the board that allowed for overclocking Xeons, along with the gems from a long time ago such as the A-bit BP6 which allowed dual socket operation with Celerons, in the time it took to work around all of it’s bugs it is unfortunately already deprecated, discontinued, and unsupported, and the top of the line Xeons X5690 processors are selling for little enough in the second hand market that the gains simply do not justify the effort, as appeared to be the case 2-3 years ago when starting with the several times cheaper X5650 processors.

In retrospect, when the effort is accounted for, a similar build using a pair of X5690 Xeons and a Supermicro X8DTH-6F motherboard would have almost certainly been a cheaper and less problematic experience. It might not have any overclocking functionality, but while offering the same number of PCIe x16 slots (7) and memory sockets (12), it does support 192GB of RAM (4x more than the SR-2 in the same number of sockets) without any special undocumented approaches required to make it work, and comes with an 8-port SAS controller on-board, while suffering from none of the problems above. Something that just works is usually much more economical than something that ends up requiring many days of troubleshooting effort.

Virtually Gaming, Part 2: Evolution – Consolidation and Move to KVM

In the previous article in this series, I detailed the journey to my original configuration with a single host providing multiple gaming capable virtual machines as a multi-seat workstation. But things have changed since then – many game distribution platforms such as Steam, GOG and Desura have native Linux versions, and many games have been ported to run natively on Linux. The vast majority of the ones that haven’t now work perfectly under WINE.

Consequently, the ideal solution has changed as well. In the original configuration, there were 3 seats on the system – two Windows VMs for gaming and one Linux VM for more serious use. At least one of the Windows VMs could now be removed, and it’s functionality replaced with WINE and native ports.

At the same time KVM advanced greatly in features and stability, and is now much better aligned with the requirements of this multi-seat workstation project. Perhaps most importantly, the latest QEMU even provides a feature that provides a much better workaround for the issue I had to patch Xen’s hvmloader for: max-ram-below-4g (option to the -machine parameter). Setting this to 1GB comprehensively works around the IOMMU compatibility bug of the Nvidia NF200 PCIe bridges on the EVGA SR-2, without any negative side effects.

Even better, KVM also includes patches that neuter the Nvidia driver’s ability to detect it is running in the VM (add kvm=off to the list of options passed to the -cpu parameter). That means that modifying the GPU firmware or hardware to make it appear as a Quadro or Tesla card is no longer required for using it in a virtual machine. This is a massive advantage over the original Xen solution for most people.

Summary of the most significant changes:

  • Host system updated to EL7 (CentOS)
    Required to facilitate easier running of more recent kernels and Steam (no more need to build and update an additional package set to support Steam as on EL6, including glibc). On the downside – this necessitates putting up with systemd.
  • Xen replaced by KVM
  • Windows 7 VM now uses UEFI instead of legacy BIOS
    This does away with all of legacy VGA complications such as VGA arbitration and the UEFI OVMF firmware even downloads and executes the PCI devices’ BIOS during the VM’s POST, which results in the full splash screen and even UEFI BIOS configuration menus being available during the VM boot on the external console.
  • XP x64 VM removed
    Superseded by using native Linux game ports and WINE for the rest (so far every XP compatible game I have tried works)

Some of the extra repositories I used for this are:

OVMF UEFI and SeaBIOS Firmware repository from here: https://www.kraxel.org/repos/

Mainline kernel from elrepo repository: http://elrepo.org/tiki/tiki-index.php

Bleeding edge QEMU (needed for the max-ram-below-4g option).

The full libvirt xml configuration file I use for QEMU is here:


edi
11111111-1111-1111-1111-111111111111
16777216
16777216
4


GENERIC
GENERIC
01/01/2014
0.91


GENERIC
GENERIC
GENERIC
1
11111111-1111-1111-1111-111111111111
GENERIC
GENERIC



hvm












destroy
restart
restart

/usr/libexec/qemu-kvm




1

The reason for the qemu:commandline section is that libvirt and especially virt-manager do not actually understand all possible QEMU parameters. The ones that they don’t support directly are in this section to avoid errors and complaints from virsh and virt-manager in normal use.

You may also notice that there are some unusual sections and values in there, so let me touch upon them in groups.

Windows Activation and Associated Checks

When you first activate Windows with a key, it keeps track of several important details of the hardware in order to detect whether the same installation has been moved into another machine. Most licenses (e.g. OEM ones) are not transferable to another machine. So in order to ensure that our installation is portable (e.g. if we upgrade to a different hypervisor at a later date), we set the various values to something static, easily memorable and predictable, so that if we ever need to migrate the VM to another host, it will not cause deactivation issues. The important settings are here (these are not in all cases complete sections, only the fragments required for this purpose, see above for the full configuration):

11111111-1111-1111-1111-111111111111

  
    GENERIC
    GENERIC
    01/01/2014
    0.91
  
  
    GENERIC
    GENERIC
    GENERIC
    1
    11111111-1111-1111-1111-111111111111
    GENERIC
    GENERIC
  


  


  
    1
  

Nvidia Bugs/Features Workarounds

The following sections are required in order to work around the NF200 PCIe bridge bugs (max-ram-below-4g=1G) and the Nvidia driver feature that disables GeForce GPUs in virtual machines (kvm=off):


  
  
  
  

CPU Configuration


  

The reason this is important is because most non-server editions of Windows only allow up to two CPU sockets. By default QEMU presents each CPU core as being on a separate socket. That means that no matter how many CPUs you pass to your Windows VM, while they will all show up in Device Manager, only a maximum of two will be used (you can verify this using Task Manager). What the above configuration block does is instruct libvirt to tell QEMU to present four cores in a single CPU socket, so that all are usable in the Windows VM.

VFIO and Kernel Drivers

In my system I have two identical Nvidia GPUs. Numerically, the second one is primary (host), and the first one is the one I am passing to a virtual machine. I am also passing the NEC USB 3.0 controller to the VM. This is the script I wrote (in /etc/sysconfig/modules/) to bind the devices intended for the VM to the VFIO driver:

!/bin/bash
 nvidia1='lspci | grep "GTX 780 Ti" | head -1 | awk '{print $1;}`
 hda1=`echo $nvidia1 | sed -e 's/.0$/.1/'`
 nvidia2=`lspci | grep "GTX 780 Ti" | tail -1 | awk '{print $1;}'
 hda2=`echo $nvidia2 | sed -e 's/.0$/.1/'
 nec=`lspci | grep "NEC" | awk '{print $1;}'
 echo nvidia        > /sys/bus/pci/devices/0000:$nvidia2/driver_override
 echo snd-hda-intel > /sys/bus/pci/devices/0000:$hda2/driver_override
 echo vfio-pci      > /sys/bus/pci/devices/0000:$nvidia1/driver_override
 echo vfio-pci      > /sys/bus/pci/devices/0000:$hda1/driver_override
 echo vfio-pci      > /sys/bus/pci/devices/0000:$nec/driver_override
 modprobe vfio-pci
 echo 10de 1284     > /sys/bus/pci/drivers/vfio-pci/new_id
 echo 10de 0e0f     > /sys/bus/pci/drivers/vfio-pci/new_id
 echo 1033 0194     > /sys/bus/pci/drivers/vfio-pci/new_id
 echo 0000:$nvidia1 > /sys/bus/pci/devices/0000:$nvidia1/driver/unbind
 echo 0000:$hda1    > /sys/bus/pci/devices/0000:$hda1/driver/unbind
 echo 0000:$nec     > /sys/bus/pci/devices/0000:$nec/driver/unbind
 echo 0000:$nvidia1 > /sys/bus/pci/drivers/vfio-pci/bind
 echo 0000:$hda1    > /sys/bus/pci/drivers/vfio-pci/bind
 echo 0000:$nec     > /sys/bus/pci/drivers/vfio-pci/bind
 modprobe nvidia

Note that the PCI bus IDs will change if you add more hardware to the machine – that is why I wrote this script, rather than assigned the devices statically by ID. The above script works for me on my hardware – you will almost certainly need to modify it for your configuration, but it should at least give you a reasonable idea of the approach that works.

Important: The devices this identifies have to match what is in your libvirt XML config file in the relevant hostdev sections. You will have to adjust that manually for your configuration, either using virsh edit or virt-manager.

Also depending on your hardware, you may need to do the initial Windows installation on the emulated GPU rather than the real one (e.g. if you are using a USB controller for the VM that requires additional drivers, as is the case with the USB 3.0 controller I am using for my VM). Otherwise you will get display output but be unable to use your keyboard/mouse during the installation.

Gaming on Linux: Steam

Pre-packaged Steam binary used to be available form the rpmfusion repository, but this no longer appears to be there. Thankfully, there is also a maintained negativo17’s repository for Steam for Fedora 20+, which installs and runs fine on EL7. You may also need to grab a few RPMs from Fedora 19 because EL7 doesn’t ship with a full complement of 32-bit libraries. The ones I found I needed are these:

libbsd-0.6.0-3.fc19.i686
libtxc_dxtn-1.0.0-3.fc19.i686
libxkbcommon-0.3.0-1.fc19.i686
openal-soft-1.16.0-2.fc19.i686
SDL2-2.0.3-1.fc19.i686
SDL2_image-2.0.0-4.fc19.i686

The reason these are from Fedora 19 is because F19 is virtually identical in terms of package versions to EL7.

Typically, the Steam RPM installation is a one-off, mostly to bootstrap the initial run, and install the dependencies. After that, a local version of Steam will be installed in the user’s home directory in ~/.local/share/Steam/. In light of the recent Steam bug resulting in deletion of the user’s entire home directory, I implemented a solution that runs Steam as a separate steam user, from that user’s own home directory. That way should anything similar to this ever happen, the only thing that would be deleted is the steam user’s home directory rather than any important files not related to running Steam games.

To do this, you will need to add a steam user, and give it necessary permissions:

$ sudo adduser steam
$ sudo usermod -a -G audio,games,pulse-access,video steam

Add the following to /etc/sudoers.d/steam:

%games ALL = (steam) NOPASSWD: /bin/steam

Create the following script (e.g. /usr/local/bin/steam.sh):

!/bin/bash
 xhost +SI:localuser:steam
 chgrp audio /run/user/$UID /run/user/$UID/pulse
 chmod 750 /run/user/$UID /run/user/$UID/pulse
 sudo -u steam /usr/bin/steam
 sudo -u steam pkill dbus-launch

From there on, when you invoke steam.sh, it will launch steam as the steam user, and pass the graphical output to the Xorg session of the logged in user. The net result is that any potentially damaging bug in Steam or associated games can only do damage to the files owned by the steam user. This security model is not dissimilar to the Android security model where every application runs under it’s own user, for similar security reasons.

Gaming on Linux: WINE

There are two obvious options for this:

1) PlayOnLinux

2) More traditional WINE (I use the one from DarkPlayer’s repository)

I only had to make one configuration change to WINE, and that is to disable the dwrite.dll library in WINE (to disable it, run winecfg, go to Libraries -> add dwrite.dll, edit dwrite.dll entry and set it to disabled). I am using XP version emulation, which isn’t even supposed to include dwrite.dll, and the problem it causes is that fonts are invisible in Steam and some other applications.

End Result

The end result is a much cleaner virtual machine configuration: e.g. no missing RAM like before with Xen, due to the NF200 bug workaround, and no need for hardware modification of my GeForce cards. The performance seems very smooth, and so far the entire setup has been completely trouble free.

There is also one fewer virtual machine and one fewer GPU in the system without any loss of functionality. Should I require an additional seat in the future, it will most likely be a Linux one, and implemented using a Xorg multi-seat configuration.

Microsoft Security Essentials on 64-bit XP

Yet another Windows related article – this detour from more typical content is expected to be short lived.

Microsoft Security Essentials was never officially supported on 64-bit Windows XP, but version 2 nevertheless installed on it and worked fine. Version 4 (version 3 never existed) refuses to install directly, saying that the version of Windows is unsupported. However, if you install version 2, the version 4 installer will happily run and install version 4 as an upgrade. It will pop up a message every time you log in warning that XP64 is EOL, but otherwise it will work just fine. So the trick is to install version 2 and then upgrade to version 4.

You may be wondering why this is relevant. My findings are that most realtime anti-malware programs thoroughly cripple performance. I used to run ClamWin+ClamSentinel as one of the least bad options, but even this was quite crippling. MSSE, on the other hand, is much more lightweight, and has thus far proved itself to be as effective in tests as most of the alternatives. The overall performance of the system is now much more acceptable.

Chrome Installer Error 0xc0000005 on Windows XP

I don’t tend to write much about Windows because it’s usefulness to me is limited to functioning as a Steam boot loader, and even that usefulness is somewhat diminished with Steam and an increasing number of games being available for Linux. Unfortunately, I recently had to do some testing that needed to be carried out using a Windows application, and I noticed that Chrome reported the above error when attempting to update itself.

The Chrome installer crash with the opaque 0xc0000005 error code on XP64 (Chrome is still supported on XP, even though MS is treating XP as EOL). Googling the problem suggested disabling the sandbox might help, but this isn’t really applicable since the problem occurs with the installer, not once Chrome is running (it runs just fine, it’s updating it that triggers the error).

A quick look at the crash dump revealed that one of the libraries dynamically linked at crash time was the MS Application Verifier, used for debugging programs and sending them fake information on what version of Windows they are running on. Uninstalling the MS Application Verifier cured the problem.