Jump to content

Will A.I. vocal software improve it's accuracy in the near future?


Recommended Posts

Sorry if this isnt the right place to post this but I'd really like some educated guesses. 

Ive been using an online AI voice service for my backing vocals. The tones of the voices sound great but the pronunciation often gets misinterpreted. It can be a pain to get something usable. 

Since Im not a big computer person, I'm wondering if u guys expext to see the "listening accuracy" part of AI improving in the near future? 

I'm hoping in, at least, the next couple years, AI will stop turning my "ahhs"  into " oohs". Is that unrealistic or is that a probability? 

Thanks 👍

Link to comment
Share on other sites

Yes, this is the wrong place for such a theoretical question. The discussion may continue once it's moved to the Coffee House forum. Artificial Intelligence is becoming more and more human, and humans are becoming more and more artificial.  You're just wanting your artificial backup singers to get the pronunciation correct, and others are just wanting AI to write the music because they are only lyricists.

AI will just keep doing what we tell it to do until that day when AI starts telling us what to do.

  • Like 2
Link to comment
Share on other sites

1 hour ago, noynekker said:

Yes, this is the wrong place for such a theoretical question

Understood, I do realize I framed the questions in a theoretical way but I was really hoping for some technical insights.

I understand that technology is always improving but sometimes things have limits. And it's hard to get my head around AI "learning  to listen better" cause I don't really understand the workings, and therefore the limits to that technology. 

Btw, I did try searching this online before I posted it and I got nothing. I also asked the question in a coffee house chat and no one responded. I think I'll just let the subject go. Thanks 🤙

  • Like 1
Link to comment
Share on other sites

21 minutes ago, T Boog said:

I understand that technology is always improving but sometimes things have limits. And it's hard to get my head around AI "learning  to listen better" cause I don't really understand the workings, and therefore the limits to that technology. 

No worries @T Boog . . . if there is a need for something, someone out there will make a business out of fulfilling that need . . . so you've just voiced the need for it.

Link to comment
Share on other sites

Unrelated to audio, just graphics, but I have previously posted some thoughts (and complaints) about the way the tools I've tried out for my album cover "art" don't "listen" to the user very well, or at all.   A general summary below:

I think the main problem with all of these tools is that none of the people creating them is familiar with the art process, terminology, etc., for each of the different types of tools, and so the tools don't "know" any of the terminology or processes required to be able to describe a specific result.  Thus, the user can't specify things in a well-known way and get a predictable result.   

Additionally, none of the tools are editably iterative.   Meaning, you can't take the result you got, tell the tool what is wrong with it, and have it just go back and correct those things, leaving everything else the same.   Instead, all of the tools start over and the user has to provide an updated descriptional guesswork that hopefully gets the tool to spit out a more useful result....

 

But it will never be the exact result desired, because it can't be edited, and it can't be specified correctly in the first place.

 

For instance, in an image tool, you cannot even specify point of view, lighting, camera focus, angles, relationships between objects, etc.  Well, you can state them, but none of the tools obey any of them.  You might get lucky and ask for something in a sunset lighting and get yellowish lighting but also get the sunset in the image even though you asked for it to be *lit by the sunset behind you*.  Etc. 

 

Most of the "cover art" on my bandcamp and soundclick stuff has been updated with "AI assisted" images...some of them are completely AI generated, but I've had to edit some of them significantly, or even composite various layers of various separately generated pieces to get a useful result  (like the Gareki covers, and the banner image, etc).

Still, none of htem is the result I really wanted, but I don't have time to do the artwork *and* the music; each takes dozens ot hundreds of hours, so I have to pick one and that's the music.  If I had money I'd pay a human to do the art for me, but I don't...the music will never earn me anything so it certainly won't pay for the art, and besides myself I have a shaggy fourlegged people-sized monster to feed....

Edited by Amberwolf
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Forgot this part:

 

I have also tried out a couple of sound-generation tools that work by inputting a text description, but neither one understands any type or form of any words that actually describe sounds, and cannot generate what is asked for, and again it can't edit the results, just start over with a result that is also wrong in a different or similar way.  

 

I also tried one of the from-scratch music generation tools, just to see how bad it was, and it is just as bad as all the others, for exactly the same reasons--it cannot accept and use any terminology related to music creation or description, and it cannot edit the results only start over with yet another totally wrong result that can't be edited....   

 

So they are all exercises in frustration if you are an actual artist trying to create a specific result.   i wouldnt let any of these tools anywhere near my actual music; there is zero chance they'd do something useful to it. 

 

I suppose if a user doesn't have a specific result in mind, and just wants something they don't have to put significant work into to fill a hole in something, they can be used, but otherwise... "not ready for prime time" is a huge understatement. 

 

In conclusion: I don't think that any of these tools will really get better until they listen to the experienced people using them that actually *are* artists of that type, and remake the tools so they can be used correctly, *and* they make them able to edit them iteratively rather than starting over on every try. 

It would help if the people creating the tools had any idea how to use such tools to create things with, since they would be able to build them to be used to create specific things instead of stuff that randomly happens to resemble the input description....sometimes. :/ 

 

 

 

 

Edited by Amberwolf
Link to comment
Share on other sites

6 minutes ago, Amberwolf said:

I don't think that any of these tools will really get better until they listen to the experienced people using them that actually *are* artists of that type, and remake the tools so they can be used correctly, *and* they make them able to edit them iteratively rather than starting over on every try. 

😁

Link to comment
Share on other sites

FWIW, I wouldn't be able to use any of the output of these tools at all, except that I'm pretty good at "macgyvering" stuff (of any kind) together out of whatever I have available (a skill learned out of necessity from a young age and honed over the decades to a finely invisible edge ;) ). 

So like everything else I do, I take the bits it spits out that I see potential in, then chop them up and mix them with each other and other things and my own direct input and get something resembling what I would create if I actually had everything I *really* needed to make it. 

  • Like 1
Link to comment
Share on other sites

Posted (edited)
1 hour ago, Amberwolf said:

none of the people creating them is familiar with the art process, terminology, etc., for each of the different types of tools, and so the tools don't "know" any of the terminology or processes required to be able to describe a specific result.  Thus, the user can't specify things in a well-known way and get a predictable result.   

Additionally, none of the tools are editably iterative.   Meaning, you can't take the result you got, tell the tool what is wrong with it, and have it just go back and correct those things, leaving everything else the same.   Instead, all of the tools start over and the user has to provide an updated descriptional guesswork that hopefully gets the tool to spit out a more useful result....

 

But it will never be the exact result desired, because it can't be edited, and it can't be specified correctly in the first place.

 

Thanks Amberwolf. This was a great explanation btw. It helped me get a better picture of what's going on inside of AI and the challenges of programming it to that level of complexity.

But I do see hope in your statements too. It sounds to me like with some better & more complex programming, we will eventually see improvements. 

I know AI has a long way to go but every little step forward will only make me happier.  Cheers!

Edited by T Boog
Link to comment
Share on other sites

35 minutes ago, Amberwolf said:

I take the bits it spits out that I see potential in, then chop them up and mix them with each other and other things and my own direct input and get something resembling what I would create if I actually had everything I *really* needed to make it. 

This is exactly what I do too. But it's, of course, time consuming and rarely ends up how I envisioned it.

I really look forward to AI getting smarter so I can use my "macgyvering" for other things. 😉

Link to comment
Share on other sites

58 minutes ago, T Boog said:

Thanks Amberwolf. This was a great explanation btw. It helped me get a better picture of what's going on inside of AI and the challenges of programming it to that level of complexity.

But I do see hope in your statements too. It sounds to me like with some better & more complex programming, we will eventually see improvements. 

The thing that really has to happen is for the "programmer mentality" (I don't know a better phrase, but if you do I can replace this with it) to go away, and be replaced with the "serve the user" mentality.  Virtually every programmer (individuals or companies) I've ever attempted to work with or suggest things to has a "vision" of what they want a program to do, but that almost never coincides with what a user of that program actually needs it to be able to do, or the way the user needs to be able to interact with it, and when it conflicts, the user virtually always loses. 

There are individual feature exceptions here and there in some programs, but as a general rule it prevents every piece of software out there from being able to successfully or fully serve the purposes to which users need to put it.

 

Some software is much much worse than others, and grows less usable with every iteration despite more features being added.  

Some software decisions are probably controlled by marketing or whatnot for stuff that's for sale, but even with opensource free stuff tthe same issues arise (though at least there, when they say "go fork yourself" it means something a bit more useful :laugh: ) .  

 

I've attempted to communicate feedback to all of the developers of the AI tools I've tried out, with zero response from any of them yet.  (some of the tools don't have a way to provide feedback). 

  The Google labs "image fx" tool *has* changed to include a version of some parts of some of the feedback I've given (but I'm certain that many others have given the same feedback), but the version of the bits they did include don't fix anything, and don't make it any more useful--some of it actually makes it harder to use with less predictable output, which makes it less useful overall.   And none of the most important things have changed at all--it still doesn't understand even the most very basic bits of terminology, and doesn't follow what it sometimes knows consistently, so you still can't predict what you will get out of it, which makes it far less useful as a real tool, and like most of these things, more of a gimmick.

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, T Boog said:

This is exactly what I do too. But it's, of course, time consuming and rarely ends up how I envisioned it.

FWIW, its the same way I make stuff that uses vocals or real instruments, etc.  Anything I can't draw midi notes for, and play from a synth I already have, I have to find or make or build or record audio clips of, then manipulate those to roughly approximate what I would *actually* put there if I could sing or play that instrument, etc.  (well, if I could play *any* instrument).   

I even end up doing this for the output of the synths--theere are many things not controllable via MIDI for them, or stuff that can't be done without editing the instrument itself to do those things and then running two copies of the synth (original plus edited), sometimes three or four to get the different versions.... So I end up rendering the synth out and manipulating the audio clip to do ti isntead, which doesn't necessarily give me exactly what I want either, but is far faster and simpler, and lets me get on with the other bits of stuff I want to create.

 

(that's the real problem with tools to create things--they have to be designed to make that creation easy and fast, to stay out of the way of the creative process and let the artist do what the artist needs to get done while the idea is still there...it's not like a mechanical process that has specific steps that must be performed in a certain way every time, like machining something on a lathe, so you know the exact things you must do and if you are interrupted you can always start from where you left off...creativity doesn't work like that). 

 

 

1 hour ago, T Boog said:

I really look forward to AI getting smarter so I can use my "macgyvering" for other things. 😉

It's not so much "smarter" as "better designed for the purpose".   The tools can only learn the types of things they're programmed to.  If they aren't setup to learn things specific to the task at hand, they won't improve their ability to do the task, and that's the problem right now.  

 

Most of these things are built as LLMs, large language models, whcih you can look up the specifics of how they work, but basiclly they take a huge database of "input' and then train the model with that data to create patterns in it's behavior.   Then while in use by a user they may also learn things specific to that user or that task, but they are not human and cannot learn like a human can, where you can just explain to them what they're doing wrong and how they should do it correctly, and fix a problem that way.

 

I don't yet know enough to even begin creating one, but eventually my wolfy-bot project will use a type of one of these things to control it's behavioral patterns so that it can learn like a dog or wolf does, from interactions with the user, and can be trained like a real canine can, based on "instincts" that are already preprogrammed into it that "reward" it for correctly learning something, for instance.   

 

The catch with present versions of any of these systems is that they are complex black boxes--even the programmers have no real idea of what is going on inside them to take a specific set of inputs and cause a specific set of outputs.   So there is no way to go in and edit a "behavior" in any of them.  If it really learns something "wrong" that you don't want in the model, you'd have to erase the whole thing and start over from a backup of the model from before it learned that.   

There should be a way to backup and insert specific...well, I'll call them engrams, or behavioral routines, but at present i'ts like our own brains--we don't know which "neurons" and paths between them are actually used for any specific behavior or bit of "knowledge', etc.    If a model is designed for it, you could retrain it to do a different behavior for an input but just like in actual brains the old paths and data are still there, so if they are triggered by a specific set of inputs and conditions, the old behavior could still be used even though you never want that to happen.

 

At their cores, you could think of these paths like a giant slanted table with dimples and bumps in it that you roll balls down, so that the balls are deflected down paths that have deeper "channels" more often than shallower ones, and steered away from paths that have higher bumps.  Every reinforcment of a path increases the size of the dimples and bumps on the way to and thru that path.   But...you can't see any of those bumps or paths from outside, can't see where the ball actually goes while on the table, so you can't note down pathways and manually change them, can't back them up or copy them individually to a different table, etc.   You can also artifically alter the bumps or paths, but not knowing which specific ones they're for you can only make general changes (kinda like the sliders on the attitude/ability control tablets for the AIs in Westworld).  

 

You can copy an entire whole AI database of behaviors, but not pick and choose individual ones.   That's something that I know *could* be changed, but hasn't been yet (and I am not a programmer so I don't know how, but there's no reason the system couldn't be designed to be able to do this--it just hasn't been yet).  

 

 

  • Like 1
Link to comment
Share on other sites

most "ai" is just marketing, real ai is still in early alpha stages even if it's already been unleashed on the world, real ai has to "grow" and "learn" which takes quite some time, and the ai will only be turned off (and reprogrammed and restarted) if it's seen to be going "really wrong" - a friend described turning them off to reprogram, after watching them grow so far, as like killing your children 😬 it's going to get very interesting in the next decade or so

  • Like 2
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...