Jump to content
Mannymac

Potential CPU otimisation for Ryzen CPUs

Recommended Posts

Through experimentation I found a potential tweak for high thread count Ryzen CPUs such as my Ryzen 3950x.

This tweak made a session run without crackles at 90% engine load compared to having crackles at 90% engine load. The tweak is setting the mix thread count in the config file to your actual cores in your CPU, not your logical ones.

My Ryzen 3950x has 16 "real" cores/ 32 logical ones.
Go into the config file under settings and set the max thread count to your real cores, in my case 16 and set the thread scheduling model to 2. Et voila!

 

Let me know if this improves things for you.

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites

Interesting. I don't know much about the internal architecture of that CPU but are the virtual cores running at lower speed than the physical ones? 
That could explain why this helps because if a high workload is assigned to a low powered core it will delay the rest of the processing by that amount.
In real time multithreaded streaming the thread that takes the longest processing time will scale down the performance since all other threads have to wait for it to complete before a buffer is complete.

Share this post


Link to post
Share on other sites

Indeed very interesting.

Big thank you to you that you allows us to go into the engine that deep.

I am afraid I dont know exactly what is going on there. I do have a lot of Scustica plugins so maybe limiting the cores forces them the to "stay in one lane"?

If you want I can send my session to you if you want to analyse it.

 

Also since you are here: Is it ever possible ti implement into Cakewalk something like a lookahead" processing liek Reaper has?

For isntance my engine load is 90% but my CPU load only 20 so there are reserves if they could be adressed. Thats at 96khz with a 4096 buffer.

Share this post


Link to post
Share on other sites

Yes its possible that the other virtual cores are being preempted by the OS and not able to process as much as the main cores. @Jim Roseberry may know more about that CPU.

Lookahead is really pre-buffering in parallel. Its possible at some point for us to look into. Of course that will cause latency when playing since audio is being processed ahead of time.

Share this post


Link to post
Share on other sites

Hi Noel,

That would be a trade off I am happy to have in mixing :)

Could be especially useful at higher sample rates. I am getting more and more of these sessions for blu ray audio.

 

As an aside for further thought: Limitiing to 16 cores only brought benefits at 96khz. At 48 the performance is worse. Very interesting. My guess is there is some RAM/CPU communications going on here which Ryzen based systems seem to be very sensitive to.

Share this post


Link to post
Share on other sites

Historically I've found Intel multicore systems to be more deterministic with realtime apps. But I've been out of the loop with AMD for awhile so don't know much about how they stand up today.

Share this post


Link to post
Share on other sites

Last time I was testing the 3950x and 3970x, I remember seeing a performance boost using the "Aggressive" Thread Scheduling Model.

I also remember some issues when using the Aggressive Mode. 

 

I've been testing the new 5950x.

AMD has ***finally*** solved their performance issue at ultra low latency settings.

I'll revisit the Thread Scheduling Models... and give some details about performance.

  • Like 5

Share this post


Link to post
Share on other sites

the physical cores are generally responsible for IO activities, so it makes sense that the setting needs to be the physical cores. the "virtual cores" are simply some variant of a hyperthreading model to timeshare the physical core (since most software steps, RAM access, and IO activities take much longer than a thread swap, the manufacturers figure they can get away with that approach).

Intel definitely seems to have a better handle on this than AMD but at about 2x the price for the equiv CPU...

couple of other notes - most times a single core affinity is assigned for the OS as well network control, so if you have the option to set the core 0 to not be used by the software (sometimes this is the park control, or elsewhere...) this can help avoiding issues with unresponsive UI etc.

(disclaimer, 8-9 years ago i spent 6 months working closely with Microsoft, HP, Intel, and SAP, and the MS SQL product team on optimization of SQL Server and physical IO for solving the record locking and concurrent disk IO issues for a 25K concurrent user SAP test POC) (and it's ok if none of that makes sense 🙂 ) 

Edited by Glenn Stanton

Share this post


Link to post
Share on other sites
3 hours ago, Jim Roseberry said:

I'll revisit the Thread Scheduling Models... and give some details about performance.

Thanks Jim I can't wait to hear your thoughts.  It is good to have competition. It certainly took AMD long enough ;)

  • Like 1

Share this post


Link to post
Share on other sites
On 11/28/2020 at 9:08 PM, Glenn Stanton said:

the physical cores are generally responsible for IO activities, so it makes sense that the setting needs to be the physical cores. the "virtual cores" are simply some variant of a hyperthreading model to timeshare the physical core (since most software steps, RAM access, and IO activities take much longer than a thread swap, the manufacturers figure they can get away with that approach).

Citation for this?  From a Windows perspective (slight simplification), there are just cores, not virtual (logical) and physical, so it can't assign a thread to a physical core as there's no such thing.

A bit old, but clear: https://techgearoid.com/articles/difference-between-physical-cores-and-logical-processors/

Share this post


Link to post
Share on other sites

Yes you can set a preferred core using the SetThreadIdealProcessor function. We in fact have support for this.
There are CPU architectures where some cores are efficiency cores and others high performance. While this is very common in the mobile silicon (phones, iPad, etc) its is now becoming mainstream even on other CPU's. AMD's hyperthreading was different from Intel (not sure about their current offerings) and hyperthreading was not as efficient before. Also the new Apple silicon has low powered cores and high powered ones.

Share this post


Link to post
Share on other sites

it is simply the underlying physical architecture, memory channels, IO channels, inter-core communications, pre-execution queuing etc. so as much as Windows virtualizes  the hardware, there is definitely affinity to the physical cores because at the end of the day the OS and its critical services need it in order to ensure it can do it's job. personally, i have to admit i have not looked too deeply into Windows 10, but i did set up some park control and affinity settings so my quad cores are as optimal as i can get them. perhaps it's not needed although as Noel pointed out they support this in the software...

Edited by Glenn Stanton

Share this post


Link to post
Share on other sites

But there is no physical core (as opposed to logical) as such - a physical core has 1 or more logical cores.  All logical cores on one physical core share the same physical hardware, registers etc and to communicate to the outside world.

Share this post


Link to post
Share on other sites
46 minutes ago, Kevin Perry said:

But there is no physical core (as opposed to logical) as such - a physical core has 1 or more logical cores.  All logical cores on one physical core share the same physical hardware, registers etc and to communicate to the outside world.

There are differences between logical and physical cores because the OS handles them differently. Esp in the case of hyperthreading or in the case of AMD simultaneous multithreading (SMT). There are API's to identify whether a core is logical or physical. See GetLogicalProcessorInformation 

See this benchmark that has more details on how hyperthreaded cores can effect performance.

  • Thanks 1

Share this post


Link to post
Share on other sites
1 hour ago, Glenn Stanton said:

it is simply the underlying physical architecture, memory channels, IO channels, inter-core communications, pre-execution queuing etc. so as much as Windows virtualizes  the hardware, there is definitely affinity to the physical cores because at the end of the day the OS and its critical services need it in order to ensure it can do it's job. personally, i have to admit i have not looked too deeply into Windows 10, but i did set up some park control and affinity settings so my quad cores are as optimal as i can get them. perhaps it's not needed although as Noel pointed out they support this in the software...

What I can vaguely remember is that with the introduction of the Windows 10 2020 spring update multicore processing incl. AMD was better supported. I've got no clue if SMT is also better supported from that moment on, but it would surprise me if Microsoft is not working on better ZEN support considering Ryzen's growing market share.
 

On 11/26/2020 at 12:25 AM, Noel Borthwick said:

Historically I've found Intel multicore systems to be more deterministic with realtime apps. But I've been out of the loop with AMD for awhile so don't know much about how they stand up today.

I would highly appreciate if CbB could dive into this in order to assure best possible performance with ZEN architecture. I've been playing around a bit (not enough time to really go deep into it, unfortunately) with the settings and did notice better stability with the latest Thread scheduling model 3 recently, but if it is faster/better performing under demanding workload I can't tell yet (Threadripper 1950x). 
I hope to be able to switch to a Ryzen 9 5950X next year, so any CbB compatibility improvement is welcome.

How to use mmcss-test:

Double-click it to run it (by default, this will test 128 threads)
It is possible to use this tool to test any amount of threads:

Type run in Windows Search
Click on "Browse"
Locate the mmcss-test.exe (e.g. on your desktop)
Add a space character and the amount of threads (e.g. C:\Users\<username>\Desktop\mmcss-test.exe 512)
You can use it to test the limitation as well as to test that limiting the amount of cores used has been applied correctly. The tool will output the amount of MMCSS priority threads and the amount of threads which failed to set MMCSS priority.

  • Like 1

Share this post


Link to post
Share on other sites
16 hours ago, Noel Borthwick said:

There are differences between logical and physical cores because the OS handles them differently. Esp in the case of hyperthreading or in the case of AMD simultaneous multithreading (SMT). There are API's to identify whether a core is logical or physical. See GetLogicalProcessorInformation 

See this benchmark that has more details on how hyperthreaded cores can effect performance.

That's true but within a CPU with HT on, the  2 "cores" are sharing the same hardware.

Share this post


Link to post
Share on other sites
6 hours ago, Kevin Perry said:

That's true but within a CPU with HT on, the  2 "cores" are sharing the same hardware.

yes, hence the problem - which process gets priority on the execution side? OS and critical services (like file system, network, etc) - likely the OS - it's almost inevitable that one of the physical cores will get affinity set for OS which means one of the logical cores will operate impaired at best. and now with multiple cores on a chip plus issues with race-condition/timing issue security hacks, we'll see more physical cores in use than SMT/HT - certainly for the OS and related services. the cost is more power consumption and heat... but i'd like to have a CPU with 24 physical cores and 1TB RAM to make life a bit easier 😉 

Share this post


Link to post
Share on other sites
On 11/25/2020 at 12:51 PM, Mannymac said:

Through experimentation I found a potential tweak for high thread count Ryzen CPUs such as my Ryzen 3950x.

This tweak made a session run without crackles at 90% engine load compared to having crackles at 90% engine load. The tweak is setting the mix thread count in the config file to your actual cores in your CPU, not your logical ones.

My Ryzen 3950x has 16 "real" cores/ 32 logical ones.
Go into the config file under settings and set the max thread count to your real cores, in my case 16 and set the thread scheduling model to 2. Et voila!

 

Let me know if this improves things for you.

 

Logical Cores are due to SMP - basically, AMD's version of HyperThreading (or SMT, in Intel Parlance).

This has existed since the days of the Pentium 4, back in 2004 or so.

It used to be common that people recommended turning off SMT for Audio applications, but I don't know how this guidance applies today.  Many DAWs have been updated to properly support this.  It's possible that some haven't...

I get massive issues with Focusrite interfaes not playing back audio cleanly on my desktop, and it's possible that this is due to SMT.  Samplitude has similar issues on my Laptop, which is a 4C/8T Intel i7 CPU using the same interface.  Some PCs let you turn SMT off, but most laptops do not 😞

Historically, SMT was great for i.e. Video Editing, but had a habit of introducing issues in DAWs.

Edited by Maestro

Share this post


Link to post
Share on other sites

This is a good article (I think) about the why's and wherefore's of SMT/HT: https://bitsum.com/tips-and-tweaks/why-you-should-not-disable-hyper-threading-or-why-you-should/

Scheduler and architecture has changed: previously, Windows had a "habit" of using logical cores on the same physical core rather than spreading threads out over logical cores across physical dies.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×
×
  • Create New...