The Quality Assurance Process with Levi
Hi! I'm Levi, the quality assurance manager at System76. I’m here to take you a small peek behind the curtain at what our QA process looks like for a brand new product.
The first thing that we do when we receive a new piece of hardware is to install Pop!_OS on it and see how well it works. If you’ve ever had a machine that initially came with Windows, then installed Linux on it, you probably have a pretty good idea of what that can look like. If we find any weird bugs, we write down the steps to reproduce them, then make bug reports for the engineers to look into.
FIRMWARE
On machines that will run open firmware, the Engineering team will start developing Open Firmware and embedded controller firmware (EC firmware) almost right away. The EC firmware can be thought of as the control center for the laptop. It controls things like power, fans, keyboard backlight, keyboard mapping, touchpad, etc.
In contrast with the EC firmware, the firmware’s main focus is initializing and running the CPU, along with many of the components connected to it, such as memory and PCI devices. Our engineers will typically “move into” the machine, and they make the firmware from the machine that they’re making the firmware for. This helps them develop firmware and fix bugs very quickly, since a form of testing is also happening in tandem.
FLASHING
Next, we often have to use external flashing tools to flash firmware onto the QA machines. The engineers typically have their own units and we have a couple of units in the QA lab. The very first flash sometimes requires external flashing tools that interface with the BIOS chip directly. These tools are also needed to “unbrick” a machine that has any serious firmware problems during testing.
BIOS firmware flashing
The main BIOS flasher we prefer is the Raspberry Pi because it is faster, but the other flasher we use is the CH341A. Both of these tools interface directly with the BIOS chip itself. The new wired plug is used on the more modern WSON-8 chip package and has pogo pins that touch each pin. The older style is used on for SOIC-8 chips, and it has a clip that attaches to the chip instead of needing to be held. Since the WSON-8 plug doesn’t clip onto the chip, the plug must be held for the duration of the flash, making these chips much harder to flash. If you move even just a little bit, you might break one of the contacts, which interrupts and corrupts the ROM on the chip.
EC Firmware Flashing
Embedded controller firmware is externally flashed through the keyboard’s ribbon connector. We remove the keyboard, unplug the keyboard’s FPC cable, and plug in an FPC cable that attaches to an Arduino Mega via an FPC breakout board. It sometimes takes a couple tries to line up the FPC cable correctly in the socket since it’s a few pins narrower than the keyboard’s FPC cable. EC firmware can also be internally flashed from the OS, once the machine already has open EC running. The same is true for BIOS firmware as well. Internal flashing is how we’re able to apply firmware updates over-the-air and in the wild.
HARDWARE COMPATIBILITY
The main tool we use for hardware certification is a simple checklist. These checklists are continuously changing, usually getting longer, as we find more and more new features to test and bugs to re-test for.These typically include things like system control hotkeys on the keyboard, various suspend/resume behaviors, checking every single port on the machine, power behaviors, and much more.
We have various types of NVME drives, SATA drives, and RAM chips that we use to test the machines. We mainly focus on the types of hardware that we sell the units with, but we also test a lot of other brands. We know that many users end up adding drives and RAM, and we want to make sure that it’s going to be a painless process.
Ports and charging
There are a lot of power behaviors in our checklist. To test charging, we drain the laptop battery down until it dies, plug it into the charger, then make sure it turns back on as expected and runs smoothly while it charges back to full without interruptions.
Machines with a U-class CPU like the Darter Pro, Galago Pro, Lemur Pro, or the Pangolin are a little bit more complicated because they support USB-C charging. That means we have to make sure they work well with docking stations and docking monitors, which provide charging over USB-C, deliver a video signal over USB-C, and any USB, audio, or ethernet functionality that the dock may have. So that USB-C port is doing a lot at once, and it all has to work.
Even though it’s generally simpler, we also check the barrel charger. Connecting the barrel charger and USB-C charger to the laptop at the same time is something the EC firmware should be able to handle, and normally the barrel charger is given precedence for charging. We also check that it’s not trying to charge through both connections at the same time. I haven’t seen a laptop attempt that yet, but it sounds like it would be bad.
Power limits
You can only get so much out of the battery safely, so there are safety mechanisms that turn the machine off when too much power is drawn to the battery too quickly. Making sure those limits aren’t hit is something I test extensively. Laptops that get halfway through compiling a project, then suddenly shut down because the CPU got a little too power-hungry are aggravating to use.
Similarly, chargers all have power limits as well. Overdrawing power from a charger usually makes it reset or turn off. Some of them even need to be unplugged from the wall and plugged back in to reset them after an overdraw. Crawling under your desk several times a day to reset your charger also gets old very quickly.
Making sure we don't hit any of those power limits is super important. If we see any of that happen during testing, power settings in firmware are re-adjusted to keep everything working smoothly.
Benchmarking
Benchmarking is how we make sure that a machine is putting out all of the performance that it can. Users might be building projects, working with machine learning, or playing high-end video games. So all of that needs to work, and we want to make sure it works well.
Video games are particularly fun to test. We'll often just fire up a game and play it for a little while to make sure it’s running smoothly. Games that have built-in benchmarking tools, Deus Ex: Mankind Divided or Cyberpunk 2077 are very nice, because they can run themselves, and provide framerates after each run. If framerates start dipping, then I look into what’s causing the dips by monitoring sensor data, or reading through system logs.
Actually using a machine the way it's going to be used by the people that want to play games is important. I often bring home prototypes in the evening just to play some games at home. Playing my games in my familiar setup helps give me a great idea of how the machine is performing.
We also utilize Phoronix Test Suite pretty extensively. There are so many great benchmarks, suites, and data at openbenchmarking.org that we love to use. Being able to test different components and specific workloads, then compare data between multiple different machines is so incredibly valuable.
General stress testing is also something we do quite a lot. Good ol’ `stress-ng` is the quick go-to for stress testing. We’ve also found that Folding at Home is also a nice, heavy load that can quickly expose a wide variety of potential power or thermal issues.
SOUND
Acoustic performance is important. Since the QA lab is in a literal factory, which gets pretty noisy, we had to come up with a way to check how loud laptops would get. We made a sound-deadened cube in the factory and painted it up like a big Rubik's cube. It’s basically a simple drywall and wood construction, but it is insulated heavily in the walls. There’s also a layer of sound-deadening foam on the inside walls. Having this quiet environment in the factory is important to make sure the machine sounds right and isn’t overly noisy. If it is, we do the best we can to quiet it down.
Conclusion
When we get a brand new potential product, QA testing can be a complicated process that can take weeks or months to complete. What I described was just a peek into this process. If you have any questions about it, feel free to reach out to me. by my username in our Pop!_OS Mattermost chat is @leviport, and I’m always happy to answer questions.
And with that, until next time! Take care.
Like what you see?
Share on Social Media