When done right, there aren't any artifacts due to conversion that wouldn't be much worse when working in 44.1 kHz and 16 bits all through the rendering pipeline. The hard part about sampling rate conversion is finding the best sounding low-pass filter for the material (which is solved very simply by using one of the popular sample rate conversion algorithms like SoX , or Audacity, or r8brain, or any other good one as can be found on http://src.infinitewave.ca/).
Converting from 24 bit to 16 bit has no artifacts either. You either simply truncate numbers which raises the noise floor (which in 16 bit resolution can result e.g. in audible distortion of long reverb tails at high listening levels), or you use dithering to utilize the statistic distribution of signals below the 16 bit noise floor, which adds a tiny bit of noise, but that noise contains audible information thus lowering the noise floor even further. Just like gradients are dithered in digital imaging to avoid banding with hard edges in gradients.
Not all advantage is being dumped, and the conversion doesn't add garbage. Unless you consider dithering garbage, in which case you just leave it off without losing anything that would have been there in the first place.
People used to compare this to saying "It doesn't make sense to shoot a movie on 35mm film stock, when the movie is going to be released straight to VHS." So, you're trying to keep the production quality as high as you can, as long as you can from the beginning of the production process in order to end up with the best possible end product. This way you capture the most information at data acquisition and lose the least of it during the steps further down the line.
Best,
Michael