Will A.I. vocal software improve it's accuracy in the near future?

T Boog · March 18

Sorry if this isnt the right place to post this but I'd really like some educated guesses.

Ive been using an online AI voice service for my backing vocals. The tones of the voices sound great but the pronunciation often gets misinterpreted. It can be a pain to get something usable.

Since Im not a big computer person, I'm wondering if u guys expext to see the "listening accuracy" part of AI improving in the near future?

I'm hoping in, at least, the next couple years, AI will stop turning my "ahhs" into " oohs". Is that unrealistic or is that a probability?

Thanks 👍

noynekker · March 18

Yes, this is the wrong place for such a theoretical question. The discussion may continue once it's moved to the Coffee House forum. Artificial Intelligence is becoming more and more human, and humans are becoming more and more artificial. You're just wanting your artificial backup singers to get the pronunciation correct, and others are just wanting AI to write the music because they are only lyricists.

AI will just keep doing what we tell it to do until that day when AI starts telling us what to do.

T Boog · March 18

1 hour ago, noynekker said:

Yes, this is the wrong place for such a theoretical question

Understood, I do realize I framed the questions in a theoretical way but I was really hoping for some technical insights.

I understand that technology is always improving but sometimes things have limits. And it's hard to get my head around AI "learning to listen better" cause I don't really understand the workings, and therefore the limits to that technology.

Btw, I did try searching this online before I posted it and I got nothing. I also asked the question in a coffee house chat and no one responded. I think I'll just let the subject go. Thanks 🤙

noynekker · March 18

21 minutes ago, T Boog said:

I understand that technology is always improving but sometimes things have limits. And it's hard to get my head around AI "learning to listen better" cause I don't really understand the workings, and therefore the limits to that technology.

No worries @T Boog . . . if there is a need for something, someone out there will make a business out of fulfilling that need . . . so you've just voiced the need for it.

Amberwolf · March 18

Unrelated to audio, just graphics, but I have previously posted some thoughts (and complaints) about the way the tools I've tried out for my album cover "art" don't "listen" to the user very well, or at all. A general summary below:

I think the main problem with all of these tools is that none of the people creating them is familiar with the art process, terminology, etc., for each of the different types of tools, and so the tools don't "know" any of the terminology or processes required to be able to describe a specific result. Thus, the user can't specify things in a well-known way and get a predictable result.

Additionally, none of the tools are editably iterative. Meaning, you can't take the result you got, tell the tool what is wrong with it, and have it just go back and correct those things, leaving everything else the same. Instead, all of the tools start over and the user has to provide an updated descriptional guesswork that hopefully gets the tool to spit out a more useful result....

But it will never be the exact result desired, because it can't be edited, and it can't be specified correctly in the first place.

For instance, in an image tool, you cannot even specify point of view, lighting, camera focus, angles, relationships between objects, etc. Well, you can state them, but none of the tools obey any of them. You might get lucky and ask for something in a sunset lighting and get yellowish lighting but also get the sunset in the image even though you asked for it to be *lit by the sunset behind you*. Etc.

Most of the "cover art" on my bandcamp and soundclick stuff has been updated with "AI assisted" images...some of them are completely AI generated, but I've had to edit some of them significantly, or even composite various layers of various separately generated pieces to get a useful result (like the Gareki covers, and the banner image, etc).

Still, none of htem is the result I really wanted, but I don't have time to do the artwork *and* the music; each takes dozens ot hundreds of hours, so I have to pick one and that's the music. If I had money I'd pay a human to do the art for me, but I don't...the music will never earn me anything so it certainly won't pay for the art, and besides myself I have a shaggy fourlegged people-sized monster to feed....

Edited March 18 by Amberwolf

Amberwolf · March 18

Forgot this part:

I have also tried out a couple of sound-generation tools that work by inputting a text description, but neither one understands any type or form of any words that actually describe sounds, and cannot generate what is asked for, and again it can't edit the results, just start over with a result that is also wrong in a different or similar way.

I also tried one of the from-scratch music generation tools, just to see how bad it was, and it is just as bad as all the others, for exactly the same reasons--it cannot accept and use any terminology related to music creation or description, and it cannot edit the results only start over with yet another totally wrong result that can't be edited....

So they are all exercises in frustration if you are an actual artist trying to create a specific result. i wouldnt let any of these tools anywhere near my actual music; there is zero chance they'd do something useful to it.

I suppose if a user doesn't have a specific result in mind, and just wants something they don't have to put significant work into to fill a hole in something, they can be used, but otherwise... "not ready for prime time" is a huge understatement.

In conclusion: I don't think that any of these tools will really get better until they listen to the experienced people using them that actually *are* artists of that type, and remake the tools so they can be used correctly, *and* they make them able to edit them iteratively rather than starting over on every try.

It would help if the people creating the tools had any idea how to use such tools to create things with, since they would be able to build them to be used to create specific things instead of stuff that randomly happens to resemble the input description....sometimes.

Edited March 18 by Amberwolf

T Boog · March 18

6 minutes ago, Amberwolf said:

I don't think that any of these tools will really get better until they listen to the experienced people using them that actually *are* artists of that type, and remake the tools so they can be used correctly, *and* they make them able to edit them iteratively rather than starting over on every try.

😁

Amberwolf · March 18

Uh, yeah. About like that.

Amberwolf · March 18

FWIW, I wouldn't be able to use any of the output of these tools at all, except that I'm pretty good at "macgyvering" stuff (of any kind) together out of whatever I have available (a skill learned out of necessity from a young age and honed over the decades to a finely invisible edge ).

So like everything else I do, I take the bits it spits out that I see potential in, then chop them up and mix them with each other and other things and my own direct input and get something resembling what I would create if I actually had everything I *really* needed to make it.

T Boog · March 18

1 hour ago, Amberwolf said:

none of the people creating them is familiar with the art process, terminology, etc., for each of the different types of tools, and so the tools don't "know" any of the terminology or processes required to be able to describe a specific result. Thus, the user can't specify things in a well-known way and get a predictable result.

Additionally, none of the tools are editably iterative. Meaning, you can't take the result you got, tell the tool what is wrong with it, and have it just go back and correct those things, leaving everything else the same. Instead, all of the tools start over and the user has to provide an updated descriptional guesswork that hopefully gets the tool to spit out a more useful result....

But it will never be the exact result desired, because it can't be edited, and it can't be specified correctly in the first place.

Thanks Amberwolf. This was a great explanation btw. It helped me get a better picture of what's going on inside of AI and the challenges of programming it to that level of complexity.

But I do see hope in your statements too. It sounds to me like with some better & more complex programming, we will eventually see improvements.

I know AI has a long way to go but every little step forward will only make me happier. Cheers!

Edited March 18 by T Boog

T Boog · March 18

35 minutes ago, Amberwolf said:

I take the bits it spits out that I see potential in, then chop them up and mix them with each other and other things and my own direct input and get something resembling what I would create if I actually had everything I *really* needed to make it.

This is exactly what I do too. But it's, of course, time consuming and rarely ends up how I envisioned it.

I really look forward to AI getting smarter so I can use my "macgyvering" for other things. 😉

Amberwolf · March 18

58 minutes ago, T Boog said:

Thanks Amberwolf. This was a great explanation btw. It helped me get a better picture of what's going on inside of AI and the challenges of programming it to that level of complexity.

But I do see hope in your statements too. It sounds to me like with some better & more complex programming, we will eventually see improvements.

The thing that really has to happen is for the "programmer mentality" (I don't know a better phrase, but if you do I can replace this with it) to go away, and be replaced with the "serve the user" mentality. Virtually every programmer (individuals or companies) I've ever attempted to work with or suggest things to has a "vision" of what they want a program to do, but that almost never coincides with what a user of that program actually needs it to be able to do, or the way the user needs to be able to interact with it, and when it conflicts, the user virtually always loses.

There are individual feature exceptions here and there in some programs, but as a general rule it prevents every piece of software out there from being able to successfully or fully serve the purposes to which users need to put it.

Some software is much much worse than others, and grows less usable with every iteration despite more features being added.

Some software decisions are probably controlled by marketing or whatnot for stuff that's for sale, but even with opensource free stuff tthe same issues arise (though at least there, when they say "go fork yourself" it means something a bit more useful :laugh: ) .

I've attempted to communicate feedback to all of the developers of the AI tools I've tried out, with zero response from any of them yet. (some of the tools don't have a way to provide feedback).

The Google labs "image fx" tool *has* changed to include a version of some parts of some of the feedback I've given (but I'm certain that many others have given the same feedback), but the version of the bits they did include don't fix anything, and don't make it any more useful--some of it actually makes it harder to use with less predictable output, which makes it less useful overall. And none of the most important things have changed at all--it still doesn't understand even the most very basic bits of terminology, and doesn't follow what it sometimes knows consistently, so you still can't predict what you will get out of it, which makes it far less useful as a real tool, and like most of these things, more of a gimmick.

Amberwolf · March 18

1 hour ago, T Boog said:

This is exactly what I do too. But it's, of course, time consuming and rarely ends up how I envisioned it.

FWIW, its the same way I make stuff that uses vocals or real instruments, etc. Anything I can't draw midi notes for, and play from a synth I already have, I have to find or make or build or record audio clips of, then manipulate those to roughly approximate what I would *actually* put there if I could sing or play that instrument, etc. (well, if I could play *any* instrument).

I even end up doing this for the output of the synths--theere are many things not controllable via MIDI for them, or stuff that can't be done without editing the instrument itself to do those things and then running two copies of the synth (original plus edited), sometimes three or four to get the different versions.... So I end up rendering the synth out and manipulating the audio clip to do ti isntead, which doesn't necessarily give me exactly what I want either, but is far faster and simpler, and lets me get on with the other bits of stuff I want to create.

(that's the real problem with tools to create things--they have to be designed to make that creation easy and fast, to stay out of the way of the creative process and let the artist do what the artist needs to get done while the idea is still there...it's not like a mechanical process that has specific steps that must be performed in a certain way every time, like machining something on a lathe, so you know the exact things you must do and if you are interrupted you can always start from where you left off...creativity doesn't work like that).

1 hour ago, T Boog said:

I really look forward to AI getting smarter so I can use my "macgyvering" for other things. 😉

It's not so much "smarter" as "better designed for the purpose". The tools can only learn the types of things they're programmed to. If they aren't setup to learn things specific to the task at hand, they won't improve their ability to do the task, and that's the problem right now.

Most of these things are built as LLMs, large language models, whcih you can look up the specifics of how they work, but basiclly they take a huge database of "input' and then train the model with that data to create patterns in it's behavior. Then while in use by a user they may also learn things specific to that user or that task, but they are not human and cannot learn like a human can, where you can just explain to them what they're doing wrong and how they should do it correctly, and fix a problem that way.

I don't yet know enough to even begin creating one, but eventually my wolfy-bot project will use a type of one of these things to control it's behavioral patterns so that it can learn like a dog or wolf does, from interactions with the user, and can be trained like a real canine can, based on "instincts" that are already preprogrammed into it that "reward" it for correctly learning something, for instance.

The catch with present versions of any of these systems is that they are complex black boxes--even the programmers have no real idea of what is going on inside them to take a specific set of inputs and cause a specific set of outputs. So there is no way to go in and edit a "behavior" in any of them. If it really learns something "wrong" that you don't want in the model, you'd have to erase the whole thing and start over from a backup of the model from before it learned that.

There should be a way to backup and insert specific...well, I'll call them engrams, or behavioral routines, but at present i'ts like our own brains--we don't know which "neurons" and paths between them are actually used for any specific behavior or bit of "knowledge', etc. If a model is designed for it, you could retrain it to do a different behavior for an input but just like in actual brains the old paths and data are still there, so if they are triggered by a specific set of inputs and conditions, the old behavior could still be used even though you never want that to happen.

At their cores, you could think of these paths like a giant slanted table with dimples and bumps in it that you roll balls down, so that the balls are deflected down paths that have deeper "channels" more often than shallower ones, and steered away from paths that have higher bumps. Every reinforcment of a path increases the size of the dimples and bumps on the way to and thru that path. But...you can't see any of those bumps or paths from outside, can't see where the ball actually goes while on the table, so you can't note down pathways and manually change them, can't back them up or copy them individually to a different table, etc. You can also artifically alter the bumps or paths, but not knowing which specific ones they're for you can only make general changes (kinda like the sliders on the attitude/ability control tablets for the AIs in Westworld).

You can copy an entire whole AI database of behaviors, but not pick and choose individual ones. That's something that I know *could* be changed, but hasn't been yet (and I am not a programmer so I don't know how, but there's no reason the system couldn't be designed to be able to do this--it just hasn't been yet).

pedwal wally wally wha · March 18

most "ai" is just marketing, real ai is still in early alpha stages even if it's already been unleashed on the world, real ai has to "grow" and "learn" which takes quite some time, and the ai will only be turned off (and reprogrammed and restarted) if it's seen to be going "really wrong" - a friend described turning them off to reprogram, after watching them grow so far, as like killing your children 😬 it's going to get very interesting in the next decade or so

Byron Dickens · March 18

T Boog · March 18

Cheers Byron. I don't care if HAL takes over the whole galaxy. As long as he can sing awesome background vocals, I'm down for it 😉

Byron Dickens · March 18

HAL sounds kinda monotone.

T Boog · March 18

20 minutes ago, Byron Dickens said:

HAL sounds kinda monotone

That's perfect. My voice sounds monotone too 👍

T Boog · April 18

Cheer guys. Im not necessarily trying to revive this thread, I just wanted to post this because I thought it was pretty cool:

I asked ChatGPT,...

"I sometimes use AI voices when I record, but I get lots of anomalies. Will the technology get better in the near future?"

ChatGPT said, " Yeah, the tech is already improving fast, and it’s only going to get better.

Right now, AI voice generation can still have weird artifacts — like inconsistent tone, unnatural pacing, or strange pronunciation — especially if the voice model wasn’t trained on diverse data or if it's being pushed to do something complex (like emotional range, multiple accents, or long reads). But:

Voice models are evolving quickly — newer ones are much more natural and expressive.

Personalized voice cloning is getting better too — meaning more realistic and consistent voices tailored to you or a character.

Real-time corrections and better editing tools are being integrated into platforms, so cleaning up anomalies will be easier and faster.

So yeah, the weirdness you're experiencing now will likely fade out as tools become smoother and more advanced".

Next, I asked ChatGPT, "Why does my left foot have 7 toes and do women find that attractive?"

But I'll save that for another thread 😄

mettelus · April 19

The voice is an incredibly complex instrument. If you delve into the weeds with a voice, there are multiple harmonics and resonators that can be working at once. While those can be copied/analyzed, it is the "from scratch" part that creates the challenges beyond the basics. There is a quick snippet in the "Never Surrender: A Galaxy Quest Documentary" where Enrico Colantoni (who played the alien leader) talks about the tonal practice he was taught for voice getting him the part (he was initially turned down, but did that as he was leaving the audition and they changed their minds). If you analyze a voice that is "full," there is often a range of that voice where distinct harmonics exists ("chest" voice, "middle" voice, and "head" voice all in play at the same time).

A really big challenge with generation are diphthongs (vowels that are really two, like the long "I," which is really "ah-ee")... I tested Synthesizer V and 2 of the three free voices had problems with "why," the other had an issue with "you"... ironically, the first three words of the chorus I was testing are "Why don't you..." so that stopped my test fairly quick. I have not tried it a second time with the newer release.

For replacement, that is simpler (in some ways), but requires a voice model that is accurate/well sampled. DaVinci Resolve just released 20 Beta that has both a voice modeler and replacement algorithm that I tested with singing, and does fairly well, but the obvious downside of that if you MUST have permission to use a voice to use it. This doesn't fall under generation, and in some ways has more challenges in the the harmonics of the model are going to be based on the piece it was sampled from (again, the complexity of voice itself). It probably won't be too long before the AI voice generation gets these intricacies ironed out.

On 3/18/2025 at 1:17 AM, Amberwolf said:

Additionally, none of the tools are editably iterative. Meaning, you can't take the result you got, tell the tool what is wrong with it, and have it just go back and correct those things, leaving everything else the same. Instead, all of the tools start over and the user has to provide an updated descriptional guesswork that hopefully gets the tool to spit out a more useful result....

Quick rewind. I forgot to comment on this earlier, then forgot which thread it was in! Corel "put a pause" on development for Painter, PaintShop Pro, and VideoStudio last year (there are no "2024" versions), so when the Humble Bundle hit, I didn't even comment about it. However, what they did release that was "new" was the Distinct Plugins they incorporate as extras. Of those, Vision FX 2.0 is the most notable, and does have the ability to build on existing images. Quick synopsis of that guy:

(Both a pro and con) It is local, not cloud-based, so all processing is done locally, and can put a strain on your machine for high-detail pics (can take a few minutes per pic to generate, and defaults to doing 5... after 5 are done, you can change the text/setting for #5 only if you want to do just one, but have to have them all completed to do that... OR... you can cancel it when it shifts to #2 and restart the whole process... pita either way).
(Pro) It works from an image, but only two sliders are available, basically "how much can it alter the original image" and "how much must it follow the text description." For generation, you can use a blank image and say "modify this 100%," with existing images knock that down. I went through a phase of testing medusa images (nice challenge for AI), so a lot of that is the text description... as mentioned above, there is no "definitive list" of what it understands, so there is a lot of hit-or-miss here. Since is local, best to test run on low resolutions/passes to see if the returns are right, then jack up those settings. I also tested with screen caps of AI images posted online and it improved them (dramatically in some cases, as I wanted to add photo-realistic qualities). It does seem to follow some camera/lighting specifics, but this can be hit-or-miss as well.
(Con) It makes 5 images per pass, but only one is "save-able." You can screen cap the others to save (more a pita really).
(Con) No good documentation, so the keywords and artists recognized are a Hail Mary. Oddly enough, it seems "youthful portrait" is baked into it (there is a pic on their website "somewhere" of a elderly woman rendered as a young girl... well, I'll be... I actually found it, lol).
(Con) The "seed" is not select-able, it just cycles randomly when you want to change it.
(Pro) The samples on the gallery include the text prompts used (similar to other online sites).

Overall it has its uses (specifically if tweaking an existing image), and being a local plugin, it is a perpetual license. My apologies for not posting about that when was in effect (that bundle was $30 but the only thing new to it was the Distinct Plugins stuff (and Corel DRAW basic, IIRC). The rest was "2023" stuff, and brushes (both previously offered).

Quick edit: If you scroll down on that "youthful" portrait above (or the one below), you can see "related" images... which have a slider for the before/after image. Another example of a good tweak is this one.

Edited April 19 by mettelus

Will A.I. vocal software improve it's accuracy in the near future?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Please sign in to comment