Insiders EP03.2: Stephen Phillips on AI Music Composition

About This Episode

The second part of our interview with Stephen Phillips, CEO of Mawson, an A.I. lab applying machine learning and generative networks to solve fundamental problems in creative industries. In case you've missed the first part of our chat with Stephen, you can catch it here.

Otherwise, let's jump into it. In this episode, we talk about Mawson's current projects, generative networks in audio, how A.I. will change the music industry and what the word "artist" would mean to the future generations.

Topics & Highlights

00:42 — On the problem of music recommendations & discovery algorithms

Stephen Phillips: One of the things we worked on at Hunted [and] never solved, was [similarity of music]. [...] These two songs are similar, how similar are they? I can listen to them and I can tell you how similar they are. [...] But computers can't do that — the actual audio itself was a black box. [...] Pandora's approach of human labeling or what Spotify is doing — [when] they add metadata to [a song] — is a proxy for the fact that [computers] don't understand what that sounds like. [...] We tried heaps of different things at Hunted in 2011/2012. Looking back, it was foolish because the tech didn't exist, but we didn't know that. [...] Around 2013/14 at Twitter I was seeing the [machine learning] teams and [...] then, around 2016, Google started to do a bunch of stuff around speech synthesis, and it was like: “Oh, they’re gonna solve this and it's gonna be game on.” We're gonna be able to do stuff like “give me all the songs with a female vocalist, harmonica and a bongo drum”, discovery like that. I don't know if anyone wants that, but I always fantasize about how cool that would be. [...] But right now the state of the art on recommendations is something like Discovery Weekly, and they had come up with a really cool idea of “the things belong together if a human says they belong together, and the expression of that is putting them on a playlist”. It's the best thing anyone ever came up with for "we don't know what this is, how do we develop a proxy for it?". And their engine is genius at that.

03:46 — On Popgun and A.I. Music Composition

Stephen Phillips: My only idea at the time was [that A.I.] will change how we do music discovery. In 2016, I started Mawson with my original investor, and we wanted to do A.I. — but we just couldn't recruit anybody. Anyone who knew what they were doing, had gone to Google. [...] It took me nine months to find Adam Hibble, who had a team of four or five guys [...] doing deep learning projects. [...] I basically paid him and his team to build a music discovery site based on purely raw audio stuff, and they did it in like six weeks.[...] They had this idea — which everybody had this idea in deep learning — [that] you understand something by being able to generate it. In the process of generating it, you create a recipe for it, and comparing recipes with two things tell you how similar they are. So, to solve similarity and discovery, they had to generate raw audio and then I've realized: if they're gonna generate raw audio, discovery is the least exciting thing you could do with this. We're gonna write songs, we're gonna change the music industry — and that's what Popgun set out to do. [...] I brought Adam and five other young guys to come and do A.I. Music Composition. Our pitch was: we're going to have a Top 40 hit! That was our goal, and that is still the goal for the company. The company is now at twenty-something people, [...] and they have been doing, in stealth mostly, serious music competition with A.I.

David Weiszfeld: So I know there are two videos: one from a couple of years back and one from last year, they are public. Is that the latest demo that you guys have released?

Stephen Phillips: No. So we spent the first year in 2017, learning to play the piano. So how do we teach a neural net to play the piano?

David Weiszfeld: That is the demo, where somebody starts playing and then the computer finishes the the melody?

Stephen Phillips: No. That was the first third of that year. [...] That was our polyphonic prediction. [...] That's a very simple problem: How do I predict in a sequence of things? What would most likely come next? After that we did improvisation, [...] and by the end of 2017, after a year of ten people working on that one problem, we got to the point where we could compose original piano pieces — and we went to San Francisco. We basically said: check this out! You've heard A.I. make music before? They were: ye-ye, it's always kind of crappy. Well, then check this out! And we played an amazing piano and they were like "What did this?". [...] People were blown away by what it could do on a piano. And we laid out a vision of: we're gonna teach it to play every instrument, and then we're going to teach them to play together, and then we're gonna see what happens. And then we're going to give it to everybody and see what people do with it.

So in 2018, we started working: we did bass, we did drums, we did guitars. We started mixing, mastering, producing full pop songs, accompanying singers. [...] That was the end of our last demo — June 2018. [...] We are feeling somewhat in a rush, but we feel like someone's going to do this and do it properly. [...] You can half-bake this and rush it to the market — and sound like elevator music. Someone's gonna spend the time and money [...], and we have a very clear vision of — it has to be as good as what an artist could do. [...] We see other people rushing AI Music things out, and we know that this must just be heuristics, or algorithms, or — because this is hard: hard to do properly and hard to do well. We just feel like, it enables to do things that previously weren't possible at all.

17:07 — On Mawson lab projects

Stephen Phillips: One of the things we learned in that first year [at Popgun] was: it's not just music that's going to be impacted by this. What we're actually doing is imitating human creative skills: playing the piano, playing the bass — these are things that humans can do well, and if you give it enough information of how to do this you can learn to emulate that creative skill. [...] The way that we create and consume entertainment in the next five years is about to be completely changed. We have a list — in our lab here — of all the human talent skills, and how would we emulate those, and which are the most valuable ones or which ones are the most ubiquitous. The one that stood out was voice acting. [...] To be able to emulate that skill and put actors in the hands of independent filmmakers, game developers, advertisers. Having continuous space of all the possible voices and expressions of happiness, and joy, and sadness, and be able to have that under tech's control, is a weapon in entertainment. [...] So with Replica, [...] we thought: Google and Amazon are going to work really hard on voice stuff like Siri-type things, and their home devices, and all that. But will they have children crying, laughing? Will they have a donkey hawing? And how far will they take the expressiveness of this? We thought someone's going to take that to the extreme, and actually take it out of just speaking into acting, and being able to be in character, and you would be able to have a pirate, talk like a pirate. [...] So we started Replica, at the start of 2018. They worked on voice tech all year, and then there are at Techstars now. They are climbing celebrity voices, they're doing characters for games — they're playing with those a lot of those ideas.

But, essentially, the tech is very similar to what we do [at Popgun]. So, while our teams have their own code base, they've their own companies, [but] the class of networks we work [is] called Generative Networks, where we basically consume a whole lot of content, and we create this distribution of all the possible variants of this. And then, [...] by poking different parts of this multidimensional space, you can generate something that's new out of this.

The guys [who's paper we've read] could take a black and white photo and then blow it up into high res' and make it colour. [...] They took a high res' color photos and degraded them a little black and white ones and learned how to go backwards and forwards. And so, when given a bad one, they create a super res one. We were really interested in the idea of doing that in audio [at SUPERRES], [for] two reasons. [...] Could we make Skype sounds better? That's a really interesting way to compress: I don't have to send a high res' thing — I can send a really crappy version and have a network imagine what that must have been, and get a high quality version. [And the second is] how do we take media and imagine what it must be in another form. From taking an old black and white movie and make it seem like it was high res' color in HD, [or] taking content made today and making it VR-ready. Using A.I. networks to imagine stuff is really cool, and we really love that idea — because all teams are effectively doing that.

[But] the real challenge has been for all these teams — and the challenge of Popgun we working on — it's not enough to be able to generate this stuff. You have to build an interface and turn that into a tool that some people can use. No one wants to press a button and outcomes the song — they want to have access to that intelligence to make what they want.

26:10 — On the future of creative industries

Stephen Phillips: [All this] tech will be common place in the next couple of years. Like Photoshop for voice, being able to treat voice just like an image, so that you can edit it, change it, move it around, have it say all this stuff, have it speak other languages, completely change the voice identity, change from male to female, whatever — this is all going to be possible. [...] We really love the idea that all of this is just playing on a broader trend of democratization, creativity and making this transition from mass-consumption to mass-creation. These kids who grew up in Minecraft are coming through, and now entertain themselves on Fortnite and Roblox, and they're doing it by making things. They entertain themselves by being creative. [...] AI just going to bring [...] new creative tools to let those kids make whatever they can imagine.

[When] we talk to music labels about the stuff, they're kind of cool with it. Initially, people were threatened, and we understand that — it's it's new technology, and it's going to really lower the technical bar, required to make stuff that sounds good. But what it exposes is who is the star, and what is a star, and what is talent. It's much more than the ability to play an instrument. People are attracted to people because they're beautiful, they're funny or they're engaging. In music, there is no under-15 Billboard chart: as kids you've got to compete against the adults, technically — and that's really difficult. If we remove that barrier, I reckon there [will be] young pop stars out there [and] other young people are going to really identify with [them]. A.I. is going to allow them to be discovered earlier, to communicate exactly what they're feeling and saying to each other. And I think it'll just lead to a completely new pop-industry, and for the labels [...] — these people will still need to have exposure, and careers managed. I see labels as VCs for the music industry, and they'll still have to invest in the talent, and I think they'll just see more of it, earlier than they've seen before, and I think it's gonna be a great boom for them. They're gonna find all these really young stars out there.

Listen as a Podcast

Companies Mentioned (in alphabetical order)

More Insiders Episodes

Full Transcript

David Weiszfeld [00:00]: So today you are you're running Mawson. We were joking because I had a problem pronouncing and so: M.A.W.S.O.N. It's an A.I. lab in Australia, and you guys are investing and building projects from the ground. The three that I know — there may be others that are less known right now — are Popgun, Replica and SUPERRES. We're gonna start linking below the blog, to the Popgun demos and stuff, that are public which could just maybe summarize the three projects and what makes them unique?

Stephen Phillips [00:44]: One of the things we worked on at Hunted all the time, which we never solved, was... At the very core of recommendation in music is the similarity metric. These two songs are similar, how similar are they? I can listen to them and I can tell you how similar they are — if I've listened to them, but computers can't do that. The actual audio itself was a black box. The best we could do is attach labels to it, text labels. And social media or Pandora's approach of human labeling or what Spotify is doing — kind of stuff where they add metadata to it — is a proxy for the raw fact that we don't understand, with a computer, what that sounds like. And I was always fascinated why we can't work with audio? Like, why do I have to wait for the crowd to tell me these two things are similar? And so, we tried heaps of different things at Hunted in 2011/2012. Looking back, it was foolish because the tech didn't exist to do it, but we didn't know that. We basically just ran up against walls continually.

But then, around 2013/14 at Twitter I was seeing the ML teams there, I start to do stuff with deep learning that I hadn't... I thought: oh, this is going to be a thing, they're going to be able to... If they can do this with image, handwriting recognition and imagery — surely, someone's going to do this with audio. And then, around 2016, Google started to do a bunch of stuff out of their deep mind team, around speech synthesis, and it was like: oh, they gonna solve this and it's gonna be game on. We're gonna be able to do true, stuff like — give me all the songs with a female vocalist, harmonica and a bongo drum — like discovery like that. I don't know if anyone wants that, but I always fantasize about how cool that would be.

David Weiszfeld [02:27]: Maybe with other criteria, yes, but for sure, of course!

Stephen Phillips [02:30]: That's right. Well someone who sounds like Whitney Houston and, you know, with an acoustic guitar doing something in 3/4 time, or whatever, without any labeling, across massive catalogs.

David Weiszfeld [02:45]: Just based on the audio?

Stephen Phillips [02:48]: Solving the cold start problem that streaming services have. But right now the state of the art on recommendations is something like Discovery Weekly, and they had came up with a really cool idea of: the things belong together if a human says they belong together, and the expression of that is putting them on a playlist. And therefore their whole thing's driven by the intersection of playlists, and it's the best thing anyone ever came up with, for "We don't know what this is, how do we develop a proxy for it". And their engines is genius at that. And we produced a bunch of things ourselves around similar ideas — but they got to scale with it. So it worked across personalization and genres; because I had the scale of this playlist engine, that did it really well. I'd seen this deep learning stuff and started to read up on it. I've been in the machine learning myself for 10 years by then and it felt like I had to get into this space, that this was going to completely change things. My only idea at the time was this will change how we do music discovery. This is going to be the new music discovery thing.

And so for the whole of 2016, I started Mawson with my original investor, and we wanted to do A.I. stuff — and we just couldn't recruit anybody. It was impossible. Anyone who knew what they were doing had already left the building, had gone to Google, had gone to America or whatever. Back in Australia, I just couldn't — all that working in self-driving cars, and in our universities doing PhDs, and I just couldn't find... And I knew from my own experience, that I can't just get web devs and teach him this. The math required to do this was beyond what I would do and what most web devs, or even normal software engineers would do. And it took me like nine months to meet someone called Adam Hibble, who had a team of four or five guys and they were gone so high, doing deep learning projects around Brisbane at the time. And him and I kind of really hit it off, and I asked him to come and work in music, and he thought that was crazy. Like who cares? Like, if you can do this tech, this is the least interesting thing you could do. But I knew, because I knew people like Adam — he was 24/25 or something, and he reminded me a lot of myself at that age — I knew that if I could get into work on it for a little while, he would get hooked on it like I did. And so I basically paid him and his team to build a music discovery site based on purely raw audio stuff, and I did it in like six weeks.

The way they've done it, they had this idea — which everybody had this idea in deep learning — you understand something by being able to generate it. In the process of generating it, you create a recipe for it. And comparing recipes with two things tell you how similar they are. So, to solve similarity and discovery, they had to generate raw audio and then I've realized: if they're gonna generate raw audio, discovery is the least exciting thing you could do with this. We're gonna write up songs like, we're gonna change the music industry, and that's where Popgun set out to do. And then Bob Moz. It wasn't Popgun at that point. I was just working with Adam on, and we were having these "aha" moments of: oh, we're gonna be able to compose songs here. We're going to actually be able to scan the top-40 charts and then create music that sounds like that.

And so, initially, we were called Fake Records, and we were going to have a record label, we were going to release music. And then Trump came along and took over the word "fake" so we couldn't call ourselves that, and then Bob Moz, who I was a really good friends at Twitter. We'd worked together there at the last death throes of my time on Twitter. He started Techstars Music and was trying to recruit teams and he came. He suggested that Adam and I basically form Popgun and come to the program. So we did that in December 2016. We went to the inaugural class in 2017. I brought Adam and I think four or five other young guys, mid-20s, to come and do A.I. Music composition stuff. Our pitch was: we're going to have a Top 40 hit! That was our goal, and that is still the goal for the company. It's now two and a bit years later. The company is now at twenty something people, depends on having a contractors are hanging around at any point. And they had been doing, in stealth mostly, they trying to do serious music competition with A.I. So that was the first team that came through.

David Weiszfeld [07:22]: So I know there are two videos: one from a couple of years back and one from last year, they are public. Is that the latest demo that you guys have released?

Stephen Phillips [07:30]: No. So we spent the first year in 2017, learning to play the piano. That's it. So how do we teach a neural net to play the piano?

David Weiszfeld [07:41]: That is the demo, then somebody starts playing and then the computer finishes the the melody?

Stephen Phillips [07:48]: No. That was the first third of that year. So that was the demo of someone playing, and completing it was with the demo we did for TechStars. So that was our polyphonic prediction. So, can I play something on piano, and then it will predict what I'm going to play next. After we did that, we then worked on... So that's a very simple problem. How do I predict in a sequence of things? What would most likely come next? And after that we did improvisation, which is "given a piece of music, can we improvise on this and still make melodic sense?" So, can we explore all the other ways this could be played? But still keep the musicality of this piece, so it's recognizable to musicians, that yes, that's the same piece. But we're improvising with it. And then once we can do that we're ready to actually do real composition. So, at the end of 2017, after a year of ten people working on that one problem, for the whole year, we got to the point where we could compose original piano pieces, and we went to San Francisco.

We basically said: check this out. You've heard A.I. make music before? They were: ye-ye, it's always kind of crappy. Well then check this out! And we played an amazing piano and they were like "What did this?". And by then, if IA.I. can drive cars surely they can play pianos! You'd expect it to be able to be. And people were blown away by what it could do on a piano. And we laid out a vision of: we're gonna teach it to play every instrument, and then we're going to teach them to play together, and then we're gonna see what happens. And then we're going to give it to everybody and see what people do with it. And we found... We were lucky we met with some great investors in that month we spent in San Francisco. We ended up going with Khosla Ventures, because they've made so many bets in A.I at the point — I think we were 28th, or something like that — and they had really senior well respected A.I guys on their team, and we just thought they had a lot of value.

The other firms were all specialist at different things, we met Greylock and they had some awesome guys around, who were at Facebook and LinkedIn and Twitter's growth teams. But that was way too early for us, because we didn't have anything to blow up and they seemed the perfect team to do that. We didn't have any idea at that point how hard this was going to be. We knew that it took us a year to play the piano! How long is this going to take? And K.V was super-patient. Most of their investments have been around materials and medical stuff. So, they were like "ye, this is hard tech, this is going to take a while. We have to be patient around that type of thing." So in 2018, we started working: we did bass, we did drums, we did guitars. We started mixing, we started mastering. We started producing full pop songs. We started accompanying singers, and that was about the time... That was the end of our last demo. June 2018.

And since then we've been working on other stuff, which is if people playing along at home, what the next inevitable thing a company, that got to that point would do next? We haven't released what it is, it's really hard! We're think we're getting there, but we are feeling somewhat in a rush, but we feel like someone's going to do this and do it really properly and with a really talented team. And you can kind of half-bake some of this stuff, and rush it to the market, and sound like elevator music and stuff like this. Someone's gonna spend the time and the money, and we're lucky we are in Australia, on the other side of the world, we've got a great team that has been together from the start. We've got producers who work with us here in Australia, that were able to just to put our heads down, and we have a very clear vision of — it has to be as good as what an artist could do. If A.I. could do this what it would do? It has to be that good.

David Weiszfeld [11:49]: You mentioned self-driving cars and everybody expects the self-driving car to be a thousand times more efficient than a human, because if a human crashes you're like — well, you know, it's a human. But if a robot crashes, it's like the entire fleet of cars that has a problem. That's a serious lack prioritises. There was two Boeing accidents, and they removed every 737 in the world, because there was something wrong with the machine. It was not a pilot mistake. So I guess it's the same thing when you start to automate things and have A.I. do things for you. You need it to be a thousand times, a million times better than a human. You need to be flawless. Everybody needs to have the same wow, in front of the tech. A song that is half good is actually a bad song. There is no half good song. It's like meh. You want everybody to go like "WOW" in front of the song. That problem has to be extremely hard.

Cars, actually, can record: when you're driving your Tesla, for example, they're sending back all the driving information and that's how they're building the self-driving cars. So you're listening to like millions and millions and millions of songs, getting the patents. Anybody, that maybe is not tech, can relate to the self-driving cars, but that's more of a known subject and it's basically the same thing at the beginning, except the end is: one is a very rational driver, and the other is a very creative song. And this is where the hardness, the toughness of their creative side comes, at the end. Getting all the information in might not be the hardest thing, but getting it to produce something creative that is new, is actually the exact opposite than the self-driving car, which you wanted to not do anything new, and do exactly like what it's supposed to do.

And you can predict what it's supposed to do, where for you, you can't really predict what the song is supposed to be. The song will surprise you when you listen to it. I can imagine how fascinating that project must be. So we're going to link to the first demo of Techstars, the 2017. We're going to link to the 2018 one, and so I guess, in a couple of months, in a few months before the end of the year, you are going to release the new demo, that is this time, I guess, a lot more complete and has some of the things that you just explained: the different instruments, maybe more poppy. I'm extremely curious to see what you are going to come up with!

Stephen Phillips [14:17]: I feel like everything that we did at Hunted led up to the opportunity to have people's trust and patience that would give us the freedom to try and do it right. We kind of feel like — it's easy too, because there's many points, even though it's been two years and a long project to kind of stand still. There's just been enough points, many times through it where you go: "This is really cool. Like that was really cool." And we have to keep going because we did this. We haven't even really touched the surface, yet. You know what I mean? We see other people rushing A.I. Music things out, and we know that this must just be heuristics, or algorithms, or — because this is hard: hard to do properly and hard to do well. We just feel like, it enables to do things that previously weren't possible at all. And we have to keep going until we capture those and we just feel — I don't know, I just feel really privileged that everything that led up to it gives me that opportunity to be near this, and be with a team of people like this when it's happening. I'm just really lucky, so that I have to code it. I can sit back as a cheerleader for these young people tackling incredibly difficult things, and knowing that it's going to bring so much pressure to people, that feeling of creating music and sharing it — it's such a elitists thing, still, not everybody can do it! Do it well and letting other people do that is going to bring so much pleasure to people.

People look from outside and think that music is just music — and the music's everywhere. It is culture to me: it's in the movies, it's... Wherever I turn, I hear music and it does something to people that nothing else does. To be able to work in tech around that is really privileged. I'm happy to do it for the rest of my life and feel lucky I fell into it. I just feel so lucky. The team, these young guys are so committed and fell in love with the problem as well. We'll see how it plays out from here. But it's been a really fun thing to work on for the last two years, and I just know it's like the same thing that happened to Hunted. No matter how this plays out, whether we get product or not, they're the best technical team in the music industry today. I haven't met everybody, but if there's another team better than these guys anywhere in the world, I would be really surprised. Just because of the circumstances in which it happened and they shouldn't be working in music, it's only because they know me and we're on the other side of the world that they can immediately go to Google and work on something like cars or medicine. I have my pick of really talented engineers here, and they're fascinated by the problem — I feel really lucky.

One of the things we learned in that first year was: it's not just music that's going to be impacted by this. What we're actually doing is imitating human creative skills: playing the piano, playing the bass. These are things that humans can do well, and if you give it enough information of how to do this you can learn to emulate that creative skill. It's going to do that in every creative field. So for me, the way that we create and consume entertainment in the next five years is about to be completely changed. So that was when we went from... We got a very early look on what that looks like, and we need to invest in this, as in we're going to... So Replica was around. We have a list — in our lab here — of all the human talent skills, and how would we emulate those, and which are the most valuable ones, or which ones are the most ubiquitous. And one that stand out was voice acting — that speaking isn't acting, acting is much more than just speaking. To be able to emulate that skill and put actors in the hands of independent filmmakers, game developers, advertisers, all of that stuff. Having continuous space of all the possible voices and expressions of happiness, and joy, and sadness, and be able to have that under tech's control, is a weapon in entertainment. We'll change how we make every type of entertainment. And so with Replica, they were the second team who came in and Google released WaveNet, which was the gun going off on that space.

And we thought: Google and Amazon are going to work really hard on voice stuff like Siri-type things, and their home devices, and all that. But will they have children crying, laughing? Will they have a donkey hawing? And will they, like how far will they take the expressiveness of this? We thought someone's going to take that to the extreme, and actually take it out of just speaking into acting, and being able to be in character, and you would be able to have a pirate, talk like a pirate. And a knight speaking in these particular accents. And this is exploring the possibilities of that. So we started Replica, at the start of 2018. They worked on voice tech all year, and then there are at Techstars now. They are climbing celebrity voices, they're doing characters for games — they're playing with those a lot of those ideas.

But, essentially, the tech is very similar to what we do internally. So, while our teams have their own code bases, they've their own companies, I'm an investor in them. They share very much a cultural background of openness about "how do we solve these things". They have their own IP. They don't share code. They're all, they're actually a bit competitive with each other. But they do benefit from having a shared experience in a lab, where they can sit down with other people — there's 35 people in here. They're able to sit down with each other and talk about problems, and we purposely have them very parallel to each other. So there is a shared experience. One's not just doing A.I. in cars and the other one is doing music. One's doing music and this one's doing voice and, Super Res had been doing, they're the third company who came along. And it was just really out of a process. We did, on one of the projects Popgun was doing, we've seen a paper with someone who's doing super resolution imagery, and we just loved the idea. The class of networks we work on, these class of networks called Generative Networks, where we basically consume a whole lot of content, and we create this distribution of all the possible variants of this. And then you can generate by poking different parts of this multidimensional space, this generates something that's new out of this.

And these guys could take them like a black and white photo and then blow it up into high res' and make it color. And we like how the hell does that work? And the way that they've done it, is that they took a high res' color photos and degraded them a little black and white ones and learned how to go backwards and forwards. And so when given a bad one we create a super res one, and we were really interested in the idea of doing that in audio so that — two reasons. How bad could we make Skype sounds better? That's a really interesting way to compress stuff. So I don't have to send a high res' thing — I can send a really crappy version and have a new network imagine what that must have been, and have a really high quality thing. So, that been working in that class of networks around taking content — audio first but that also worked in imagery — and bringing it back to life. But the class of networks is around how do we take media and imagine what it must be in another form. So that works from take an old black and white movie and make it seems like it was high res' color in HD. But then, into the future, taking content made today and making it VR-ready, by splitting it into what the eyes must be. Using A.I. networks to imagine stuff is really cool, and we really love that idea — because all teams are effectively doing that.

Replica is imagining what a pirate sounds. Once I'd seen enough pirate — it's never seen a parrot say "Hello my name's Stephen", but I could imagine how a pirate would say that. And it's just like a class of these networks. Our new teams are working in text, we're working in imagery, we think this class of networks and this idea — that A.I. can imagine new things — is going to change how we make movies, and how we make videos, and content, and music, and. The real challenge has been for all these teams — and the challenge of Popgun we working on — it's not enough to be able to generate this stuff. You have to build an interface and turn that into a tool that some people can use. Because that's the whole point of this. No one wants to press a button and outcomes the song. They want to have access to that intelligence to make what they want. How do you control these networks is a big challenge?

David Weiszfeld [23:16]: If we have a thread of all your projects: Hunted is a backend data-scraping, ranking, charting... That's 99 percent of the workload. And then, obviously, you have to make design decisions, and the lateral scroll, and the vignettes, and stuff. But without that UX, probably the website would have not taken as much space. But if you're an A&R at the record company, or whoever at MTV was contacting you from New York, they loved the experience and the music, and actually how it works behind. It doesn't matter, as long as it just works assessment of the Apple quote: "As long as it works", most people will not try and dig under. If tomorrow you can have a cool-looking front-end and somebody's playing around with Popgun and making a pop tune — knowing and exactly understanding how the network works, how you get data in, and how it creates something creative — this entire process doesn't really mean anything to a normal person.

When a kid plays FIFA on PlayStation, they don't really understand that the ball is being calculated, and player's movement, and so forth. They're just playing a game. And that's the magic thing about "it just works". It's kind of the magic about product market fit: It's not about trying to think about cohorts and stuff, it's one day you just know. Yea, putting a very-very hard tech — because what you're doing is not features, they are like hard tech innovations — and putting that in the hands of a normal person, that is a non-technician and then succeeding — that's the ultimate test for you guys. Replica could be used by every YouTube channel, anybody that does content and wants to translate, anybody will making animation and wants to put a character. You choose a potato head, what voice am I going to give him? You choose a pirate, what voice am I going to give him? I have this fairytale, and I need characters and animals to speak. What is the voice of a speaking pig?

People at Pixar are doing castings with voices, they are doing weeklong session to find the normal voice — you mentioned the donkey, or the pig, or whatever animal. That process could be potentially internalized with, I don't know — somebody who is at a school doing graphics design and 3D animation, makes his first 3D movies for 20 seconds, and in that 20 seconds the bird speaks to the pig, at one point — what voice do you put in? It's amazing to imagine that somebody could actually do all of that using his laptop, just like a musician: Thirty years ago, you needed a studio, and today they're basically just in the laptop. Pushing that into a voice creation — that's just amazing. Translation alone is a huge problem. Ability to translate in 50 languages on the fly would just be insane.

Stephen Phillips [26:09]: The tech will be common place in the next couple of years. Like Photoshop for voice, being able to treat voice just like an image, so that you can edit it, change it, move it around, have it say all this stuff, have it speak other languages, completely change the voice identity, change from male to female, whatever — this is all going to be possible. The race is on to build out those. The tech isn't there yet: there's lots of examples, there's 10 companies or more in that space, it's still quite difficult to do, there's still a bunch of problems around emotion and capturing that. The nice thing for creative people is these tools are coming in the next year or so. We really love this idea that all of this is just playing on this broader trend of democratization, creativity and making this transition from mass-consumption to mass-creation. These kids who grew up in Minecraft are coming through, and now entertain themselves on Fortnite and Roblox, and they're doing it by making things. They entertain themselves by being creative. We used to joke with my wife, watching the obsession with Minecraft that architecture ten years from now it's got to be explosion of designs. It just flowed through to how they express themselves, and to me AI just going to bring a whole sweet of new creative tools to let those same kids make whatever they can imagine.

I think we're gonna find, and we talk about a lot of internally at Popgun, we're going to enter an age, where... And once we talk to music labels about the stuff, they're kind of cool with it. Initially kind of people are threatened, and we understand that — it's it's new technology, and because it's going to really lower the technical bar, required for you to make stuff that sounds good. But what it exposes of who is the star, and what is a star, and what is talent. It's much more than the ability to play an instrument. People are attracted to people because they're beautiful, they're funny or they're engaging. In music, there is no under-15 Billboard chart. As kids you've got to compete against the adults, technically — and that's really difficult. If we remove that barrier, I reckon there are young pop stars out there, who — other young people are going to really identify with. This A.I. is going to allow them to be discovered earlier, to communicate exactly what they're feeling and saying to each other. And I think it'll just lead to a completely new pop-industry, and for the labels it's all around — these people will still need to have exposure, and careers managed, and I see labels as VCs for the music industry, and they'll still have to invest in the talent, and I think they'll just see more of it, earlier than they've seen before, and I think it's gonna be a great boom for them. They're gonna find all these really young stars out there.

David Weiszfeld [29:10]: Probably the way you see it: Not only is it going to be maybe earlier people, people are also building art with help from A.I., collaborating with it — or you wouldn't even know because theA.I. is going to be in the DAW, in the Pro Tools, as a VST. And so when they send you a song it's just a song and who knows how the song was made. But I think it goes further than that, and Replica and PoPgun are signs of that. Kids are making animation videos easier than before, kids who are YouTube people are now making music — the bridge between what is an artist, and is it a musician? Is it an actor? Is it making a video? Is it a graphic designer? Kids today can make their own music videos — maybe not shot like a David LaChappelle video, but they can make a video, they can make an artwork. They can collaborate with people completely on the other side of the world. Like you are in Australia, and I'm in Paris right now. Those things we don't really think about anymore. The bet, I think, is in five years the definition of "what is a musician" and the idea that you grow up in your room, playing an instrument for nine years before you can kind of show the world what you're, how good technical you are. Those days are maybe not 100 percent over — you're still having guitar heroes in 20 years, people playing like shred and stuff. But that's going to go away, and artists are going to be combining different arts into one creation.

You see that with the companies who are doing YouTube monetization for video-gamers and makeup teenager YouTube channels, and those people are starting to make music. And so actually thir are managers is the YouTube monetization company, that ends up managing the singer. And so usually they end up partnering, today, with a music company, because that's not at all what they're supposed to do — they're supposed to monetize YouTube content. And they end up with a hit song. I think Republia actually just released the first single of this mega YouTube person who is not a musician, nor was he a singer until like a month ago. And all of a sudden he has this amazing promotional platform, because of the fans he has on his other thing, and he's probably like a 17 year old kid. Yeah. It's gonna go exponentially fast. It's been an hour and a half, and I took way too much of your time — so I wanted to wrap up with three standard questions we're going to ask everybody. One, and I cannot imagine how how that meeting would go between you right now and the 19 year old self. At 19 you came out of college, or you were still in college. You were going to work and is consulting kind of company, working with 25 different tach projects. Hunted was not even something you were thinking about. The news website was maybe something you would start to think about soon. What would you tell the 19 year old Stephen Phillips.

Stephen Phillips [32:11]: I don't know. Like I had a very good 20th. Like I had lots of different jobs, and I never stayed anywhere longer than two years, ever. I would save up enough money, and then I would do whatever I wanted for a year. I painted for year, I made music for a year. I'd go back when I was broke and work again. I felt like I needed it. I just felt like I'd never... I should have been looking for a mentor or someone I respected, who could help me find a path. I'd probably tell myself to start a company. I don't know why it took me till I was 35 to do that. It just didn't seem like. I didn't grow up in the valley. It wasn't a thing that people did in Australia — start tech companies. Like, I never really heard about it until I was in my mid-twenties, and by then I just didn't know how to do that. So my advice would be to start a company, probably. It's my advice I give the young guys now, especially these guys, who are so talented that they've got these unis trying to make them do PhDs, which I think it's crazy. That, you know, go and start a company, this is the perfect time: you've got no family, you've got no commitments. You can always go back and get a boring job. So I'd probably be, yea, go and start a company. Don't wait 20 years to start a company.

David Weiszfeld [33:31]: Bob Moz from Techstars, I think he was explaining — their thesis is that talent is completely evenly distributed. There is talented people in Melbourne, and Sydney, and Canberra, and Paris and Berlin — wherever. But opportunity is not. And because opportunity is not, that I think the trust is that us, as maybe a non-California people and non-New York people, we don't really believe we're... Like am I really going to start a company? Am I gonna like do a music thing, am I going to go to New York and be acquired by Twitter or actually keep the company for a while? And so yeah, while talent is completely evenly distributed, opportunity, but also self belief —because it's in the genes of the Californians to believe that they're going to change the — it's not in the genes of the Australians, and definitely not in the genes of French. So yeah, start a company, believe in the projects, and if you have an idea — just kind of go and do it and test it!

Stephen Phillips [34:25]: The big learning for me was, when I arrived in the States was they are just dudes, they are not any better than we are. They just had a completely different expectation of what is possible and that's what I spend a lot of time talking to the young guys here. I know that they seem like superhuman people because they produce so much cool shit out of there, but they're just dudes and we can beat them. We can compete with them. They're not any smarter than we are. They're just more of them. There's more money, there's more backing there's more confidence there's the things we don't have that will give you the money and we'll develop the confidence and I have to send — one of the first things I do when I recruit is — I send them there and I come back on: They just dudes! Yeah, I told you that.

David Weiszfeld [35:09]: You don't even have the language barrier. So you can send somebody over there, and they can realize that, it's the same. It's just people over there they think more, they believe more. I think in America kids, they get you know a lot of presentations in class, and so you're used to speaking about with a lot of people, defending your project, pitching, almost — it's not a pitch, it's not a pitch, but it's almost there. In France they teach you a lot about self-criticism and you know how to look at the theses, the antithesis. You have to contradict with yourself all the time. And so it makes you not have like "I believe and I go". It's more like I believe but also I doubt. And also, then I will try to believe something different, and then doubt. And we love that kind of debate. Is there a specific book or podcast that we should link to that you love and either a book you've been reading over?

Stephen Phillips [36:03]: Now I'm embarrassed to say — I haven't. I used to read ferociously in my 20s, and now I consume as much kind of music and media as I can. I don't really feel like I have the time to read now. I feel like my responsibility is to my teams and my staff, and I spend every moment that I'm not spending with my family working for them. So I feel like when I'll retire in 15/20 years and I'll read every book that I've missed out on reading.

David Weiszfeld [36:38]: What are you going to do right after this interview? I'm guessing, it's 8:40pm, probably have diner?

Stephen Phillips [36:41]: I'm going to go home, to see my family and I got an early flight to Sydney in the morning. So to see music people in Sydney. So good. Thank you!

David Weiszfeld [36:58]: It was amazing. I actually want to listen to it right now. Thank you so much for your time at such a late hour. We'll speak soon. Thanks a lot.

Share this article:

David Weiszfeld

Founder & CEO, Soundcharts.com & bsharp.biz

EP03 (Part Two): Interview with Stephen Phillips

Jump to

About This Episode

Topics & Highlights

Listen as a Podcast

Links

Companies Mentioned (in alphabetical order)

Full Transcript

Share this article:

EP03 (Part Two): Interview with Stephen Phillips

Jump to

About This Episode

Topics & Highlights

Listen as a Podcast

Links

Companies Mentioned (in alphabetical order)

Full Transcript

Share this article:

Related articles