Dolphin Progress Report: April 2017


One of the more difficult parts of being an emulator is balancing accuracy, performance and presentation. When Dolphin replaced the hacky, broken asynchronous audio with the synchronous New AX-HLE and New Zelda-HLE implementations, audio accuracy greatly increased! It came as quite the shock when users started complaining about this change and demanding asynchronous audio's return. Some of the criticisms were valid; there were bugs in early synchronous audio causing increased latency that weren't present in asynchronous audio.

All of these growing pains were eventually fixed, but, one complaint stood out - slowdown affected audio for the first time for a majority of users. This was seen as an unfixable issue. After all, it doesn't make sense for audio to run full speed if nothing else is! The issues were closed and the concern was filed away until users got used to the change.

Long-term, we did learn something from this dilemma. While synchronous audio was undoubtedly better for the project and solved the major emulation issues with audio, it caused a whole bunch of presentation issues we neglected to fix... until now.

This month, we have a lot to offer. Custom texture support gets supercharged, the JIT sees some important maintainability changes, and a smattering of audio changes include a huge presentation change to audio that will help users hear games pleasantly even under slowdown.


Notable Changes


Position Independent Code support for the x86-64 JIT by MerryMage

This is a big change that opens up a lot of uses for Dolphin that may not be very obvious. In fact, if everything is working correctly, there should be no noticeable change in Dolphin from the many changes to make Dolphin's JIT Position independent compliant. So what is PIC?

The JIT supporting PIC means that no matter where the code is compared to the rest of the executable, it will be able to access it. Because the x86 JIT was designed in the 32-bit era, Dolphin used the access methods available at the time. Code was guaranteed to be within 2GB (the maximum amount of ram a 32-bit program can use), and thus the JIT relied on methods within those limitations, even when making the jump to 64-bit!

This is because some operating systems, such as Android and Hardened Linux, randomize the ram layout as a security feature meaning that code can end up more than 2GB away! The AArch64 JIT was designed with this situation in mind and as such can handle the code being anywhere! However, x86_64 Android devices were stuck with cached interpreter at best because the x86_64 JIT didn't have PIC support.

What MerryMage's did was go through the JIT and find the individual instructions that used addressing methods incompatible with PIC, and then revise them in a manner that'd allow them to work. The hardest part about doing this isn't the task itself; it's completing the task without losing performance. Thankfully, by doing the process instruction by instruction, we were able to keep close tabs on performance and make sure there were no large performance decreases.

As mentioned above, x86-64 Android should now be able to run Dolphin's JIT. Considering that the x86-64 JIT is still much more mature than the AArch64 JIT, it is likely x86-64 Android devices will have higher compatibility and less bugs for the near future.


5.0-3260 - AX: Implement loop_counter and support UCodes without LPF by delroth and MerryMage

Now we go from a gigantic change that should affect nothing in practice to a small change that'll make a huge difference for two very particular games. MerryMage has had quite the month!

This fixes HLE audio in Rogue Squadron 2 and 3.


Headphone users beware!

Someone finally took the potato chips away from the microphone.



...I guess we should actually explain how this fixes it. Some background: Dolphin's HLE audio has been completely rewritten since Dolphin 3.0; it's one of those miraculous transformations that we really tend to forget about. HLE Audio has been so good for years now, that the remaining problems stick out like a sore thumb. Rogue Leader and Rebel Strike are among the last games that sound very wrong with HLE audio set, forcing users to take a stark performance hit by using LLE audio. While this may not have been a big deal when most titles required LLE audio for proper sound, it now sticks out like a sore thumb when they stand alone.

This is actually one of those situations where we've known what was wrong, but had no one to actually implement the fix. delroth reverse engineered the problem and documented what needed to be done, but it was a non-trivial amount of work.

Enter Citra audio guru MerryMage to implement the features outlined in order to finally fix HLE audio for these two titles. Considering these were Factor 5 games, you can bet some shenanigans were involved.


Factor 5 loves undocumented features

Factor 5 is the defacto badass of GameCube developers. Time and time again, they've take advantage of features within the GameCube that few or even no other games seem to know exist. In this particular case, they took advantage of another seemingly undocumented feature dubbed "Loop Counter." The loop counter is actually fairly simple, every time looping audio loops, it increments by one. The only reason it's anything special is that it's entirely undocumented. Because it's undocumented, no one uses it... except Factor 5.

In terms of why the games fail so badly when it's not incrementing... we're not completely sure. Our best guess is that it was using it for timings or to keep track of things, and since Dolphin wasn't incrementing it at all before, the game was getting confused and loading garbage.


Why did this take so long to get fixed???

Everything sounds so simple, right? But, there's a second part of this issue that isn't mentioned. Star Wars Rogue Squadron II: Rogue Leader uses an earlier version of the microcode without a Low Pass Filter! In fact, where "Loop Counter" was stored in Rogue Squadron 2's microcode, "Low Pass Filter" was stored in Rogue Squadron 3's!

Thus, the big delay wasn't on supporting loop counter, it was on supporting multiple revisions of the AX-Microcode. Without the ability to support and run different versions of AX, the only way to support this is nasty hardcoded hacks. Before this point, Dolphin had no reason to differentiate between the AX microcodes, and thus there was no infrastructure in place to do it.

That's not to say Rogue Squadron 2 was the only game to use the earlier microcode. Other early GameCube titles, such as Super Monkey Ball, do use it. It's just that none of the version differences matter in that case.


rogueleaderdroids.jpg

Move along.


Don't get too excited...

As great of a performance boost this is for Rogue Squadron 2 and 3, we still don't find them especially playable for a variety of reasons.

The main culprit to this is Dolphin's GPU Timings. Namely, we've been slowly getting more accurate with various timings and that means games that previously worked with outright broken timings may begin to struggle a bit. In Dolphin 5.0, Rogue Squadron 2 and 3 were mostly stable in single core, albeit pretty slow. But, since Dolphin 5.0, we've enforced SyncGPU in single core to improve GPU timings and push us closer to console accurate timings. Two of the games that have been hurt the most by this were the two Factor 5 titles, along with F-Zero GX and Super Monkey Ball also seeing some major issues crop up.

One of our goals with enabling SyncGPU permanently was to bring out issues like these so we could improve timings and finally make these games perfectly stable. Until that happens, please be patient going forward. If we're ever going to have these games running properly, it's only going to be accomplished by rewriting the bad parts of Dolphin.


5.0-3285 - Frame: Fix macOS keyboard while emulation is running by MerryMage

The MerryMage show continues with a fix for a macOS bug. We broke the ability to type while the game is running on macOS. Whoops.


5.0-3305 - VideoCommon: rework anamorphic widescreen heuristic by ligfx

Something we've learned here at the blog over the years is that users absolutely love enhancements. While higher resolution support is by far the most used enhancement, widescreen hacks have become increasingly common. We've even begun documenting them on the wiki to help users find widescreen hacks for their favorite games!

Today, we're seeing some improvements to Dolphin's widescreen detection to help with the detection of Widescreen cheats. This adjusts the heuristic to avoid erroneous swaps, allowing for more complete widescreen hacks for The Legend of the Zelda: The Wind Waker and other titles that use similar codes. This allows the menus to remain in widescreen mode without enforcing "Force 16:9".


windwakerwide-workingauto

If the user left the aspect ratio on Auto, this widescreen code resulted in the menu being vertically stretched to 4:3.

windwakerwide-workingauto

Now with the improved heuristic, users no longer need to worry about changing their aspect ratio just to play different games.



5.0-3345 - Pitch-Preserving Audio Stretching by MerryMage

This is a feature people have been requesting since synchronous audio was first standardized. Synchronous audio meant that when Dolphin isn't running full-speed, audio would have gaps, resulting in an audible stutter. While the OpenAL backend has had time-stretching for years, there's a pretty big difference between just having time-stretching and having good time-stretching is pretty huge.

After having implemented audio in Citra and dealing with the challenges of time-stretched audio there, MerryMage was the perfect person to come in and use that knowledge to improve Dolphin's handling of audio under-slowdown.


50fps without audio stretching warbles and pops horribly.

50fps with audio stretching is still slow, but much more pleasing to the ear.

Fullspeed for reference.



Sacrificing Latency for Quality

Time-stretching audio is a post-processing effect for audio that stretches out already playing audio when the game isn't running full speed. This fills in would be gaps and makes audio sound clearer during slowdown! With the new slider in the audio configuration page, you can tweak this setting. Higher latencies will result in smoother audio during more and more extreme slowdown. Unlike asynchronous audio, which caused severe audio glitches, game crashes, and other issues, time-stretching is a post-processing effect that doesn't affect actual emulation whatsoever, and will not cause any issues for users that choose to use it.

Users who prefer the lowest possible latency can still leave this feature disabled.


As of 5.0-3406 it is also available on android.


5.0-3482 - [Android] Fix game banners by mahdihijazi

The images used to represent games in Dolphin's game list are called "banners", they are images in every game that Nintendo mandated all games use to identify themselves. You can see these banners for yourself when loading the GameCube Main Menu (bios) on your GameCube. Since they were mandated, they will always be there, so banners are a reliable way to represent a game.

On Android, Dolphin only used screenshots of the latest game save to represent games in its game list. While that's a nice feature, what happens if you don't have a save to make a screenshot? 5.0-3227 addressed this by using banners to fill the blanks until screenshots were created. Unfortunately something was wrong: the colors were all wrong!


Androidtvgamelist-broken.png

Oh my god Mario and Luigi are suffocating!

Androidtvgamelist-working.png

Whew, good thing we know CPR.



Android was expecting images in a specific format, but Dolphin was just feeding it the normal RGBA data, which confused the poor Android into swapping red and blue. Now with this fix, the banners are changed into the format Android expects, and Android is rendering happily.


5.0-3489 - JitArm64: Implement Conditional Register Cache by MerryMage

Over the past month, MerryMage has been pretty busy! This change implements CR Cache for the AArch64 JIT and gives a small performance boost. A game like Pikmin 2, which was hovering around 90 - 95% speed just a month ago, can now hit full speed in many scenes and areas on the NVIDIA Shield TV. Using the POV-Ray benchmark tool, we can show how much this affects a standardized test.


conditionregistercache.svg

This may not seem like much, but every little bit helps on Android!


How does this work? The PowerPC has 8 condition registers, but ARM and x86 only have 1. You cannot use other registers as condition registers, so Dolphin emulates this feature by storing any results of the condition registers into memory for retrieval later. While this certainly works, a feature that should be free and instant now eats up CPU cycles with load/store instructions, reducing performance. This change caches the results of the condition registers into the ARM CPU's general purpose registers, allowing immediate retrieval with fewer instructions.

So far this optimization has only been implemented in the AArch64 JIT, and is not yet in the x86-64 JIT.


5.0-3506 - Compress Custom Texture Support by stenzek

Several amazing Custom Texture packs are available for Dolphin right now, including for popular games like The Legend of Zelda: The Wind Waker, Star Wars: Rogue Squadron 2 and Xenoblade Chronicles. These texture packs look fantastic and can give added detail to textures that don't look quite so up to snuff for today's gamers.

On the other hand, Dolphin's custom texture support had some caveats that made it difficult to use. To get the best performance, you had to use the prefetch custom textures option or else suffer from the GPU having to decompress textures on the fly. Unfortunately, pre-fetching and decompressing the textures in ram would take tremendous amounts of RAM! Hytapia's Wind Waker texture pack wouldn't fit into 24GBs of RAM! So users who wanted to use the texture pack without stuttering were left with downscaling textures or getting rid of textures they didn't absolutely need in order to fit it into their RAM.

Something that we knew could be done to alleviate this is Compressed Custom Texture Support, as forks of Dolphin have already proven it possible. By supporting compressed formats directly, we no longer have to decompress the textures in RAM before being sent to the GPU. This decreases the amount of ram used by nearly 85%!



And even if you still don't have enough ram to use the pre-fetch custom textures option, computers with high enough bandwidth should be able to use custom textures without pre-fetching without any noticeable slowdown thanks to skipping the decompression step.


Last Month's Contributors...

Special thanks to all of the contributors that incremented Dolphin from 5.0-3253 through to 5.0-3570!


You can continue the discussion in the forum thread of this article.

Next entry

Previous entry

Similar entries