When working on an emulator, a feature never really feels finished. Last month, crudelios triumphed with his new software bounding box implementation. It was easily the most accurate implementation of the feature to date. A few months before that, magumagu created more accurate disc timings to make games load more accurately. RachelBryk has been steadily adding features for TASing for years. Perhaps a longer term, more general project is Sonicadvance1's continued work on Dolphin's ARM port which is always receiving updates.
All of those features have seen further refinements and work this month that enhances their usability! Sometimes these changes are from one author continuing their work, but a lot of the time other contributors will join in with their own ideas, fixes, and add-ons. Just because a feature exists in the emulator doesn't mean it can't be improved further!
Notable Changes¶
4.0-4143 - Re-enable ARMV7 Floating Point Register Cache¶
4.0-4243 - Rewrite ARM Fastmem and¶
4.0-4394 - Tons of ARMV7 Fixes/Optimizations by Sonicadvance1¶
For people waiting for some major ARM/Android speedups, look no further! The Floating Point Register (FPR) cache fix lets Dolphin retain guest FPRs in host FPRs for longer than a single instruction for a sizable performance boost!
To follow that up, Sonicadvance1 also worked on fixing up the ARM fastmem. While ARM has supported this feature for quite some time, the way it was written has made it difficult to maintain and a lot of loadstore instructions don't support it. Sonicadvance1 rectified the situation by rewriting it from the ground up and getting it to work for the remaining loadstore functions, resulting in a very noticeable speedup in games that rely on those instructions.
These patches combined (among others that didn't make the cut for notable changes) make Dolphin ARM quite a bit faster. While getting performance data on Android/ARM devices is an exercise in frustration thanks to how wildly different all of the drivers are, the gain is estimated to be around 20% according to benchmarks.
For Sonicadvance1 and his NVIDIA K1 Jetson unit, that allows him to sometimes reach full speed on the Airport of Super Mario Sunshine and well over full speed in the secret stages.
Other users too have taken to their NVIDIA Shield Tablets and found some games to be very playable! Including some Wii titles that can use the GameCube Controller. Special thanks to all of the users sharing videos of their Dolphin Android experience!
A lot of people had hoped that the Nexus 9, featuring the NVIDIA K1 Denver, would be the first Android device to comfortably run a few games at full speed. Sonicadvance1 got his hands on one and has done some preliminary testing. Sadly, the jury is still out. From initial tests, it benchmarks even or slightly worse than the NVIDIA Shield Tablet and its NVIDIA K1 Jetson. But don't fret, Sonicadvance1 has hope that with some time and optimization, Dolphin will be able to wring the potential out of the device as AArch64 JIT matures.
4.0-4023 - TAS Input: Nunchuck Support by RachelBryk¶
Ever wanted to be the perfect player in your favorite games? Dolphin features TASing features, such as input recording, savestates, and frame advance to assist players in whatever tasks they wish to undertake. A majority of people will use savestates merely for extra saves before a hard section of a game or input recording to keep their favorite Super Smash Bros. Melee matches immortalized without the need for huge videos.
But for speedrunners and glitch hunters in particular, there are other features that can sometimes be even more important. One of those features is called "TAS Input", a special form of handling input that allows users to set the angles of joysticks manually and, when playing back input recordings, see all inputs in action. With RachelBryk's nunchuck support added, players can push the limits of games that require the nunchuck controller. Players will be able to find the perfect angles for tricks, glitches, and optimizations. Mix this with Dolphin's movie support and you could soon find yourself becoming the ultimate player!
4.0-4203 - FIFO Overflow Fix by skidau¶
People that have been updating Dolphin constantly may have been noticing a lot of these images. For weeks, a nasty and very difficult to track bug has been afflicting developers and users alike.
This error could be as minor as just an annoying popup, but it could result in graphical glitches, freezes, hangs and even full on crashes. Where did this bug come from? Why wasn't the patch that caused it immediately reverted? Unfortunately, this scenario was a bit more complicated and required a much harder look at why things were going wrong.
The GameCube and Wii are essentially single processor core video game consoles, with the CPU and GPU locked together in perfect sync while operating in a known environment. With all of these factors concrete and controlled, the CPU feeds graphics commands to the GPU predictably and reliably. But Dolphin has to make many different CPU and GPU combinations work together in Dual Core mode. Not only are the CPU and GPUs running at very different speeds compared to the GC/Wii with infinite possible combinations, but the threads are isolated on different cores. To control all of these factors, Dolphin uses its FIFO code to keep the two threads running approximately within spec. To make things easier, imagine this as a ring-buffer where both the CPU thread and the GPU thread go around it over and over again, like cars on a racetrack. Dolphin has to make sure that the CPU Thread and the GPU thread pass the "start line" at roughly the same time, even though both threads are operating at different speeds and the GPU thread can only go as fast as the system's GPU will allow it.
The CPU thread always stays a little ahead of the GPU thread.
But if the CPU thread laps the GPU thread, bad things happen.
If the GPU thread gets ahead of the CPU thread, it is catastrophic.
If Dolphin didn't care about performance, the solution would be to operate in single core mode with checks in place to make sure everything were in perfect sync. But Dolphin wants to achieve high accuracy with playable performance, and since 99% of games are very loose with their CPU and GPU timings, Dolphin exploits this for huge performance gains, such as Dual Core mode. But speed doesn't matter if the emulator is always crashing! Any change to the FIFO can cause severe issues like this, which is why everyone is extremely careful when touching this sensitive code.
Worryingly, there was nothing recently changed to the FIFO. This error came out of nowhere and was linked to an EXI-Timing change that was undeniably more accurate than the previous implementation. Nothing that it did should have caused widespread issues that lead to many games crashing. This put the developers in a very difficult position. Revert a change that was believed to be more accurate to avoid the bug? Or keep the more accurate code and hope that another error is found that has both sides working properly?
skidau took up the unenviable task of diving into Dolphin's FIFO. What followed was many different concepts, ideas and implementations of how to possibly fix this bug. Frustration slowly set in as change after change ended up failing. Testers and developers alike sought out and bought afflicted games, like Star Fox Adventures and Battalion Wars 2 in order to increase the volume at which ideas could be tested.
The problem ended up being a lot less convoluted than expected. For some reason, the EXI-Timing change uncovered a bug in which the GPU thread wasn't communicating often enough to the CPU thread that it was almost full. By the time the signal got to the CPU thread to stop feeding it, it had already overfilled and thrown up a Panic Alert.
Skidau ended up lowering that delay by increasing the frequency of command processor interrupts. As a reward for his hard work, skidau not only fixed the regressions that popped up, but also increased the stability in games that were having fifo problems beforehand, such as The Last Story
Fixed issue 7835
4.0-4219 - OpenGL: Hardware Bounding Box Implementation by degasus¶
Last month, crudelios revolutionized bounding box emulation with a new software implementation. This month, degasus built upon that with a new hardware implementation for OpenGL. This allows OpenGL to have all of the benefits of the software implementation with the added bonus of being able to offload the work onto the computer's graphics card!
While this implementation is actually more accurate than the software variant, it's also interesting to look at the performance implications.
As expected with a hardware implementation, faster GPUs are much, much better at handling this than older GPUs. This was also tested on a Radeon HD5850, but considering the card can't run Paper Mario: The Thousand Year Door full speed at higher IRs without bounding box, we did not feel that showing the performance results for it would be worthwhile. Just imagine very low performance with hardware bounding box in OpenGL. Radeons tend to have better performance in the D3D backend regardless.
On that note, much like with the XFB-Scaling situation a few months ago. Developers who specialize in D3D and OpenGL vary, thus this implementation is incomplete in the sense that it only supports OpenGL currently. Anyone experienced with the D3D API may find porting this implementation to Dolphin's other hardware backend an easy starter task.
4.0-4222 - Support Constant Angular Velocity For Disc Reads by JosJuice¶
More data can be read per rotation on sectors near the outside of the disc.
This one really feels like overkill at a first glance. The GameCube/Wii disc drives use Constant Angular Velocity when reading discs. No matter where on the disc the drive is reading, the disc is always spinning at the same speed.
Because the outside of the disc is a much longer path than the inside of the disc, more sectors are read per any degree of rotation as you go toward the outer rim. In most cases, this wouldn't matter at all: Wii and GameCube disc drives vary so much that magumagu's previous disc seek accuracy fixed almost every issue with disc read speeds.
Metroid Prime was one of the greatest beneficiaries of the disc timing improvements last time around. It made a lot of the tricks and sequence breaks that speedrunners loved possible in the emulator. But, during more than one instance reports came in that Dolphin was still far too fast. One of the cool tools that revealed this was the Metroid Prime Randomizer. Because the game wasn't guaranteed to be completable in this form, speedrunners were forced to think outside of the box and use all kinds of crazy strategies to get items and weapons.
One common complaint among people attempting this feat on emulator was that some of the clips and tricks weren't possible on the emulator. This didn't matter much for conventional gameplay, but TASers soon discovered that the same flaw actually changed everything about speedrunning the game. While glitches and sequence breaks were the main issues, even a basic room needs to be played vastly different with faster disc read timings.
The reason for all of this is that Dolphin was reading at a set 3000KB/s; which is actually below the maximum read speed of a Wii disc drive. Setting the read speed any lower than this would cause stuttering in disc intensive videos such as Gauntlet: Dark Legacy and Mario Golf: Toadstool Tour that have their data toward the outside of the disc. There was only one solution: implement CAV in the emulator.
Newcomer to the project, JosJuice showed up wanting to fix this exact problem. While they went through some growing pains getting acclimated with the emulator and troubles compiling, within a few weeks they brought forward a patch ready that added CAV reading to the emulator. It works by calculating where on the disc the data is based on the byte position in the ISO, and reading at the appropriate rate.
While there were a lot of different games tested with loadtimes, videos, and other possible stress tests, none were more telling than when CleanRip was used. While the program doesn't run perfectly under Dolphin, it works well enough that you can rip an ISO loaded into Dolphin. By doing that, we can check the disc speeds (with some level of accuracy) at various points of a disc and compare it to Wii consoles.
While this was being tested, the question quickly became: Can you correctly rip a disc in Dolphin and then run that in Dolphin? Naturally, we had to find out.
JosJuice also stuffed a few more goodies into the patch. By adding a slight delay (~0.067ms) to many of the miscellaneous DVD commands, there've already been a few reported changes in behavior. The troublesome subtitles in Starfox Adventures finally show up again and Sonic Riders no longer needs "Fast Disc Speed" timings to avoid a crash. All in all, this merge is a huge win for accuracy.
4.0-4235 - More Vectorized Vertex Loading by Fiora¶
Dolphin's vertex loader is one of the key components in the GPU pipeline. Without it, the host computer's GPU wouldn't be able to understand any of the Wii/GCs commands to load vertices, which includes position, texture coordinates, normals, and color information. All of these commands need to be translated into something that a typical PC GPU will be able to process.
A lot of this code can be optimized through the use of SSE instructions (particularly SSSE3 in this case), but this was only done for a few vertex formats. Fiora decided that just wasn't good enough and went through and optimized position, texture and normal position. This results in a very large speedup in games that were limited in Dolphin's vertex loader.
Of course, the caveat to getting this optimization is that the processor must support SSSE3. Otherwise, it'll just fallback to what Dolphin was doing prior to this point. For Intel users, support starts at the Core2Duo line, and on the AMD side, the Bobcat line of CPUs.
4.0-4381 - JIT: Optimize Single Precision Operations Based on Knowledge of their Inputs by Fiora¶
Back in August, Fiora fixed the rounding in floating-point multiplication. This change was needed for accurate physics in many popular games, including F-Zero GX and Mario Kart Wii. This did come at a small speed cost; the rounding had to be emulated for all multplication. This is one of many extra operations Dolphin does to remain accurate to the hardware.
This set of optimizations begins with the idea that, if Dolphin can prove that in some cases such extra operations aren't actually necessary, it can just skip them entirely in many games. The details are a bit technical, but the result is a significant performance improvement across most titles, especially floating-point heavy ones.
There's actually potentially more potential improvements in this area; Fiora had some other more difficult ideas that didn't quite make it into this patch. Perhaps next month.
Dolphin Qt User Interface Progress Update¶
Dolphin has been using wxWidgets since its open source inception. While it was great for a time, and has gone through many, many iterations, its limitations and flaws constantly cause headaches for users and developers alike. One of the plans to remedy this was to move to a Qt interface. Back in September, initial commits hit the emulator but amounted to no more than the UI returning a "Hello World." Since then, a lot of progress has been made and as of 4.0-4006, the Qt interface can even boot up games! While it's nowhere near completion, this is a huge step toward a revolutionary new design.
But that's not all: the new GameList that waddlesplash made supports new display modes for the game list!
DolphinQt's tree view will allow you to narrow down the game you're looking for by folder. Whatever folder it is placed in will show up in the tree, so you can arrange them by console, genre, franchise, whatever you want!
Grid view is exactly what it sounds like. All games are laid out with their covers and names, maximizing how many games can fit on the screen at once. Users with lots of games may have trouble picking them out, but there's no better way to show off your collection to friends!
Currently the only way to view the DolphinQt UI is to build Dolphin yourself and pull in the Qt submodule. Because of its incomplete state this should only be tried by advanced users/developers, especially since this UI will do nothing a conventional user would care about anyway. In the future though, Qt should be this bigger and better than the wxWidgets UI, and vastly easier for the developers to work with.
For those curious as to the challenges of writing a new UI for an old program, waddlesplash has written about WX and the Qt rewrite from a more personal take here.
As per usual, we'd like to thank...¶
All of our committers, contributors, testers, and users that supported us throughout the month of November! Have a safe and happy holidays!