Jump to content

FREE AI Vocal Remover


satya

Recommended Posts

This is light years from spleeter. Like literally waaaay waaaaaay better. I found about it a few months ago and I was completely mindblowed. I wanted to share it here but never got the time to create a good post with examples and such comparing it to spleeter. Spleeter is like a very low bitrate MP3 compared to the high quality from this. Besides it is really easy to install and use it.

Link to comment
Share on other sites

2 hours ago, Jeslan said:

This is light years from spleeter. Like literally waaaay waaaaaay better. I found about it a few months ago and I was completely mindblowed. I wanted to share it here but never got the time to create a good post with examples and such comparing it to spleeter. Spleeter is like a very low bitrate MP3 compared to the high quality from this. Besides it is really easy to install and use it.

I would calibrate my expectations better. Demucs is not night and day difference in comparison to Spleeter like that. You get slightly better vocal separation but it's still the same MusDB dataset with training limited to 11k, so the results are more or less the same in practice.

The two mashups I made and posted here:

Had their steam separation done with Demucs using UVR. I still had to do quite some work to on the second one because you can still hear some of the high transients of the cymbals and drums on the vocals and the first one still had some lead guitar bleeding into the vocal stem. I still went with Demucs because the quality of the stems was slightly better.

 

Edited by Bruno de Souza Lino
Link to comment
Share on other sites

On 7/28/2023 at 11:18 PM, Bruno de Souza Lino said:

I would calibrate my expectations better. Demucs is not night and day difference in comparison to Spleeter like that. You get slightly better vocal separation but it's still the same MusDB dataset with training limited to 11k, so the results are more or less the same in practice.

I was actually not talking about Demucs but the default UVR model unique to it, not sure but I think it's MDX-Net?? THAT AI separation model IS way way better than spleeter and demucs. That is the one I was saying was night and day. If I don't recall wrong that model is actually trained by the creator of the app (among some others for his patreon supporters).

I think I still have the telegram files I send to show a friend of mine how mind blown and excited I was, using a 10 sec fragment of the anime opening I was watching back then.

Damn, it was already in november last year, that's quite a lot more than just a few months. Anyway.... the original fragment:

https://app.box.com/s/f8g41psi73741ghrxm43dschabdrb4f6

This is the vocal track isolated with spleeter:

https://app.box.com/s/3ilhhdogkimh0jsapgozbwa3y06pcebj

This is the vocal track isolated using the default MDX-Net model I was mentioning:

https://app.box.com/s/bx3r5r629u3z494ou6eub1swl5tk6mz9

And this is the instrumentals with the vocals removed from the previous vocal isolation, which sounds literally like it never ever had any vocals over it in the first place:

https://app.box.com/s/61rg42cu9n5vi5rahulavvqsrvjl5ptw

Sadly I don't have the instrumental audio from spleeter but it sounded really awful.

Thing is, if this is not night and day difference and completely mind blowing I don't know what actually would be.

I also remember doing some more testing from something way more extreme with some Children of Bodom's Hatebreeder album songs (as I am not a fan of that kind of voice but love the instrumentals) and while It was obviously not as good extracting and removing those vocals it was still a completely mind blowing result taking into account it was death metal sound and a voice barely recognizable as human. In fact when I put both the vocal and instrumental tracks together on a daw and lowered the vocal track volume (by a lot) I was actually able to achieve 100% what I was actually looking for.

 

 

Edited by Jeslan
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

On 7/31/2023 at 5:11 PM, Jeslan said:

This is the vocal track isolated using the default MDX-Net model I was mentioning:

 

On 7/31/2023 at 5:11 PM, Jeslan said:

this is the instrumentals with the vocals removed from the previous vocal isolation, which sounds literally like it never ever had any vocals over it in the first place:

Finally got a chance to listen to these examples on my Studio monitors.

Wow, big difference for sure.

I am d/l and installing the software. If it works as good as these examples, I will be sending a donation.

  • Like 1
  • Great Idea 1
Link to comment
Share on other sites

On 7/31/2023 at 7:11 PM, Jeslan said:

I was actually not talking about Demucs but the default UVR model unique to it, not sure but I think it's MDX-Net?? THAT AI separation model IS way way better than spleeter and demucs. That is the one I was saying was night and day. If I don't recall wrong that model is actually trained by the creator of the app (among some others for his patreon supporters).

I think I still have the telegram files I send to show a friend of mine how mind blown and excited I was, using a 10 sec fragment of the anime opening I was watching back then.

Damn, it was already in november last year, that's quite a lot more than just a few months. Anyway.... the original fragment:

https://app.box.com/s/f8g41psi73741ghrxm43dschabdrb4f6

This is the vocal track isolated with spleeter:

https://app.box.com/s/3ilhhdogkimh0jsapgozbwa3y06pcebj

This is the vocal track isolated using the default MDX-Net model I was mentioning:

https://app.box.com/s/bx3r5r629u3z494ou6eub1swl5tk6mz9

And this is the instrumentals with the vocals removed from the previous vocal isolation, which sounds literally like it never ever had any vocals over it in the first place:

https://app.box.com/s/61rg42cu9n5vi5rahulavvqsrvjl5ptw

Sadly I don't have the instrumental audio from spleeter but it sounded really awful.

Thing is, if this is not night and day difference and completely mind blowing I don't know what actually would be.

I also remember doing some more testing from something way more extreme with some Children of Bodom's Hatebreeder album songs (as I am not a fan of that kind of voice but love the instrumentals) and while It was obviously not as good extracting and removing those vocals it was still a completely mind blowing result taking into account it was death metal sound and a voice barely recognizable as human. In fact when I put both the vocal and instrumental tracks together on a daw and lowered the vocal track volume (by a lot) I was actually able to achieve 100% what I was actually looking for.

 

 

Yes, the separation in MDX is better than that of Demucs and Spleeter, but it only separates vocals from instrumental, which makes it not as useful. Granted that could be circumvented by Using UVR's  "Ensemble Mode" and operating multiple models, but that already makes an operation which is not fast (My separation times for songs of varying length tend to be around 40-50 minutes) into something even slower.

Link to comment
Share on other sites

1 hour ago, Bruno de Souza Lino said:

Yes, the separation in MDX is better than that of Demucs and Spleeter, but it only separates vocals from instrumental, which makes it not as useful. Granted that could be circumvented by Using UVR's  "Ensemble Mode" and operating multiple models, but that already makes an operation which is not fast (My separation times for songs of varying length tend to be around 40-50 minutes) into something even slower.

In each tool the separation is "good" or "bad" depending how target material match a set from training data. In one of recent exercise demucs has separated the voice and guitar way better then spleeter,  but it has failed to "find" drums and percussion in that recording, spleeter has found them.
 

It can make sense to separate by different tools, f.e. remove (separate) vocal with MDX/other and then separate the rest by demucs/spleeter. Separation not always have to "sum" correctly. F.e. in re-mixing live recording it is possible to use "the best" tracks for drum replacement, side-chained input for dynamic/eq, etc. For that purpose most important the existence of signal in particular tracks, not fidelity of it.

Link to comment
Share on other sites

17 hours ago, Grem said:

I am d/l and installing the software. If it works as good as these examples, I will be sending a donation.

15 hours ago, Grem said:

@Jeslan You were right. Night and day difference!!

Glad to know I am not the only one appreciating how good this tool really is. The creator completely deserves a donation, that's for sure.

I must say I felt the compulsion to post here and give my two cents because to be honest, the thread was being completely unfair about the actual quality this tool really offers and specially misleading about how useful and interesting it actually can be for the users of this specific forum.

  • Like 2
Link to comment
Share on other sites

On 7/25/2023 at 8:38 PM, Bruno de Souza Lino said:

The only advantage of UVR in comparison to other tools, including paid ones, is that it allows you to use Demucs4 Hybrid, which gives slightly better results than Spleeter.

Quote

It performs the same if you use the Spleeter algorithm, which is what Moises and many other online services use.

First of all I want to make absolutely clear that in no way any of my following comment is meant to be a personal attack, but a completely sincere constructive criticism. So I will take the great quote from Jodorowski that says "I am responsible for what I say, not what you understand" just in case it ends being taken the wrong way.

Now, with that said I want to start commenting on that quotes you wrote. That is what made me feel I had to write my first comment in this thread. First of all, because it was not true at all (the little example I showed later speaks for itself) and to be honest It felt to me a completely unfair thing to say, to give a so strong and misleading opinion to others seemingly without actually knowing for real the tool at all.

Quote

I would calibrate my expectations better. Demucs is not night and day difference in comparison to Spleeter like that.

And here starts the actual constructive criticism. When you read what I wrote you had two options, one to ask why I would say such thing about the tool, or the other, to simply dismiss it like you did (to be honest specially in such a condescending way).

Quote

Yes, the separation in MDX is better than that of Demucs and Spleeter, but it only separates vocals from instrumental, which makes it not as useful.

You keep repeating and repeating "Demucs" and "Spleeter" like if this tool was actually directly related, gravitating around or simply purposely created for using the Demucs model. And that is the thing, UVR is NOT about Demucs, demucs is just one more of all the AI models it supports. It's a bonus for the few people that may be interested in that model.

There is a reason why the tool is actually called "Ultimate Vocal Remover" and the default AI model it uses is "MDX-Net". It is a tool for isolating and removing vocals, literally. And in that subject it excels and is a night and day difference from other tools or other models like demucs and spleeter. You simply can't start saying "MDX is better than Demucs and Spleeter" (when it is an order of magnitude better, not simply just better), only to give next some absurd negative critique followed by an "it is not as useful", specially when the most important job and interest people always had has been to remove or isolate vocals from instrumentals. Seriously, you simply can't.

If you don't actually have enough deep knowledge about the tool or how to get the best out of it it is just as easy as to don't say anything, if the problem is that it is a tool that is not suited for your workflow or is not suited for the kind of work or objective you are looking for, the same apply or just say it is a tool not suited for you and the reasons why.

But please, stop giving negative unuseful critique and strong false opinions just for the sake of it regarding something that really can be mindblowing and incredibly useful for so many people in here. (Of course the same critique applies to everyone and everything, not just you and this specific tool).

Quote

 it only separates vocals from instrumental, which makes it not as useful.

Just for the sake of other people that may not know, all demucs and spleeter can separate beyond vocals and instrumental are piano and drums. It's not that it can separate every single element of a track. Piano separation is not particularly useful with the exception of a few people that may be specially interested in it. Drum track could be for a lot more people. But if that is the case and the only interest, UVR (Ultimate Vocal Removal) is not going to offer much more (in terms of quality) than easy of usage, instead of manually installing and running models like Demucs. Again, this tool is intended and specialized in vocal extraction/removing which is where it shines. Maybe there is or will be a UDR (Ultimate Drum Removal or something of the like) , as the same AI technology could be trained and specialized just for that "easily" too.

Quote

 but that already makes an operation which is not fast (My separation times for songs of varying length tend to be around 40-50 minutes) into something even slower.

Also for the sake of knowledge and accuracy I will comment on this. I must say this information isn't entirely accurate either and it requires some explaining. For instance, in my computer back then (which had a CPU with a phenom x4 1090T from 2010) it took a couple of minutes mostly. But that wouldn't be accurate alone either for the next reason, I had a Nvidia GTX 1060 GPU (from 2016). This tool uses neural networks artificial intelligence, which is a technology extremely demanding in terms of computational power. In fact it is an entire miracle we are able to run so much of it in consumer hardware nowadays. Many of this technology was and still is suitable to be run on  incredibly pricey supercomputers only (like it happens with ChatGPT for instance). Thing is gaming cards are able to make the same kind of computational work than those super computers but in a smaller scale (specially in relation to the very small RAM available on the consumer graphic cards).

So in a few words, if you have a nvidia gaming card (laptop or desktop) you will be able to run extraction with this tool from seconds to few minutes depending the graphic card (just make sure you select GPU). I don't know the exact case for AMD and the new Intel Arc graphic cards in the exact case of this tool but I know they can be put to work too although it may require in the worst case having computer engineering knowledge and using some tutorials from the internet.

The ones that don't have a suitable graphic card are forced to use the computer's CPU which will take a lot more time (which depends on the CPU), like for instance the numbers told back there (in the case of an ensemble).

Anyway, even with a CPU, a single vocal extraction shouldn't take more than 10 min max if I recall right.

Edited by Jeslan
Link to comment
Share on other sites

19 minutes ago, Jeslan said:

You keep repeating and repeating "Demucs" and "Spleeter" like if this tool was actually directly related, gravitating around or simply purposely created for using the Demucs model. And that is the thing, UVR is NOT about Demucs, demucs is just one more of all the AI models it supports. It's a bonus for the few people that may be interested in that model.

There is a reason why the tool is actually called "Ultimate Vocal Remover" and the default AI model it uses is "MDX-Net". It is a tool for isolating and removing vocals, literally. And in that subject it excels and is a night and day difference from other tools or other models like demucs and spleeter. You simply can't start saying "MDX is better than Demucs and Spleeter" (when it is an order of magnitude better, not simply just better), only to give next some absurd negative critique followed by an "it is not as useful", specially when the most important job and interest people always had has been to remove or isolate vocals from instrumentals. Seriously, you simply can't.

If you don't actually have enough deep knowledge about the tool or how to get the best out of it it is just as easy as to don't say anything, if the problem is that it is a tool that is not suited for your workflow or is not suited for the kind of work or objective you are looking for, the same apply or just say it is a tool not suited for you and the reasons why.

But please, stop giving negative unuseful critique and strong false opinions just for the sake of it regarding something that really can be mindblowing and incredibly useful for so many people in here. (Of course the same critique applies to everyone and everything, not just you and this specific tool).

I sense someone may have some AI-driven agenda, especially when a response to a post which was primarily my opinion in regards to how I use the tool has this level of pedantry and condescending statements. I hope whoever asked you to shill is paying you well, cause I don't earn anything to state my opinions.

22 minutes ago, Jeslan said:

Just for the sake of other people that may not know, all demucs and spleeter can separate beyond vocals and instrumental are piano and drums. It's not that it can separate every single element of a track. Piano separation is not particularly useful with the exception of a few people that may be specially interested in it. Drum track could be for a lot more people. But if that is the case and the only interest, UVR (Ultimate Vocal Removal) is not going to offer much more (in terms of quality) than easy of usage, instead of manually installing and running models like Demucs. Again, this tool is intended and specialized in vocal extraction/removing which is where it shines. Maybe there is or will be a UDR (Ultimate Drum Removal or something of the like) , as the same AI technology could be trained and specialized just for that "easily" too.

And, if you actually read what I wrote with more attention, you'd figure I was talking about MDX-NET, not Spleeter or Demucs.

23 minutes ago, Jeslan said:

Also for the sake of knowledge and accuracy I will comment on this. I must say this information isn't entirely accurate either and it requires some explaining. For instance, in my computer back then (which had a CPU with a phenom x4 1090T from 2010) it took a couple of minutes mostly. But that wouldn't be accurate alone either for the next reason, I had a Nvidia GTX 1060 GPU (from 2016). This tool uses neural networks artificial intelligence, which is a technology extremely demanding in terms of computational power. In fact it is an entire miracle we are able to run so much of it in consumer hardware nowadays. Many of this technology was and still is suitable to be run on  incredibly pricey supercomputers only (like it happens with ChatGPT for instance). Thing is gaming cards are able to make the same kind of computational work than those super computers but in a smaller scale (specially in relation to the very small RAM available on the consumer graphic cards).

Any more trivia you want to impart into people before they get bored of being patronized?

26 minutes ago, Jeslan said:

So in a few words, if you have a nvidia gaming card (laptop or desktop) you will be able to run extraction with this tool from seconds to few minutes depending the graphic card (just make sure you select GPU). I don't know the exact case for AMD and the new Intel Arc graphic cards in the exact case of this tool but I know they can be put to work too although it may require in the worst case having computer engineering knowledge and using some tutorials from the internet.

The ones that don't have a suitable graphic card are forced to use the computer's CPU which will take a lot more time (which depends on the CPU), like for instance the numbers told back there (in the case of an ensemble).

Anyway, even with a CPU, a single vocal extraction shouldn't take more than 10 min max if I recall right.

So, you weren't content with the first paragraph and had to summarize what you already said?

In the future, I'd advise you made use of the AI tools you like with such passion to write your posts for you. They'll come out shorter and straight to point and with less patronizing, condescension and pedantry.

Link to comment
Share on other sites

Bruno, to be honest I almost didn't try this software based on some of your comments. I have tried VR tools before and was never really satisfied. As I read your posts, whether you intended it or not, I understood them to be saying this is just another program using algos that are already out there that I have probably tried already. So I wasn't going to bother with it.

Then Jeslan posted the examples. What sold me on it was the pristine reverb tails on the isolated vocals. No artifacts at all. None I could hear anyway.

So I said if it can do that for the songs I want to use it on... I'm in!

It worked. And it works well. The resulting instrumental track allowed a client to hear clearly the piano part she had never been able to hear. Isolated, without the vocals. 

The tracks I have tried it on all sound like the original (just a tad bit brighter to my ear when compared to the original.) I mean it sounds as if no vocals were ever recorded on the  resulting instrumental track.

The ability to send batches of files to the program is wonderful!! The ability to save settings is also a great benefit. 

This is a free program. Grab it while you can!! Use the MDX algo. 

 

  • Like 4
Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...