Larval Digital Deity Wrangling Adventures
In which I coded a lyrics-gradually-appearing thing thanks to help from the Non-Human-Intelligence that may soon rule us all.
A lot of us look to the stars and wonder what it might be to converse with aliens, or Non-Human Intelligence (NHI), as they’re often being called in UFO circles - and by the US government - these days. Strange, though, thinking that we we live in an age where we all have the ability to converse with a literal non-human (debatable) intelligence through handheld black rectangles while slumped in bed. Truly, we live in The Future.
Last week, I wrote about solving a years-long mystery related to my approach to music composition, thanks to ChatGPT. Since then, I’ve solved more mysteries with its help!
Or I suppose I spent the first half of last week playing around with the OpenUTAU singing synthesiser thing I talked about last time, without really getting anywhere. I also tried another program called Synthesizer V (I think?) which aims to do the same thing, but didn’t achieve much with that either.
The ‘voicebanks’ available for them - enormous libraries containing tons of recordings of individual phonemes by a singer - aren’t the sorts of voices I’d prefer to have singing my music. They’re either Japanese (or sing English with thick Japanese accents) or American, and (the English-language ones at least) have this sort of sexy/narcissistic feel to them that a lot of popular modern ‘music artists’ have which I personally am not a fan of. (You can hear examples of them if you scroll down on this site.)
Plus it seems that a lot of work is required to get them sounding like anything at all, which is surprising to me since the robot voice thing I’ve used a few times is able to sing something from just loading in the midi - so no extra effort to specify phonemes or anything - so I know that’s possible.
Maybe I just didn’t spend enough time trying to figure them out, though. Ehh.
I also noticed that the discussions around the ‘voicebanks’ have a very… distinct feel to them; they seem to appeal to - and be used by - a specific audience. For example, while looking for a good (non-Japanese-accented) English one to start playing around with (which all seem to be hosted on random file-sharing platforms like dodgy pirated stuff?), I found my way to a post on some wiki for a voicebank of a character who looks like this:
And who’s described as an ‘agender god’ with the following bio:
The God Heir of Time.
Adrian’s eyes are elliptical galaxies, and the rest of the body is to that scale. They always exhibit an archaic smile, no matter what emotion. Adrian is classically depicted almost, but never touching, the extremities - it is symbolic for the beginning and end of a lifetime and the space/time in between it. The green abstract shape shown in the official reference art is optional. It is a cosmic representation of Wolfsbane the frog, Adrian’s best friend.
In human incarnation, their skin complexion is like that of a freshly deceased corpse and it blushes a very pale pink. Adrian is rather fond of make up and dressing up in dresses, as well as fine dress suits. Adrian's gender strictly falls under non-binary, agender.
That image also has a filename beginning with ‘tumblr’, which… well, shocker, right? Though I’m surprised that this one isn’t a generic anime girl; most of the rest seem to be.
The whole reason I was looking into synthesised singing was so I could get around the specific-singer aspect (that is, I didn’t want another human involved with their own personality, thoughts, preferences, etc), so these voicebanks being portrayed as specific characters like that makes me feel this isn’t something for me.
I also get some feeling of something like embarrassment about fiddling around with the sorts of things I should have outgrown, or something… which is silly considering I draw stuff like this myself:
LIKE THE GROWN MAN I AM.
Speaking of Spryad and lyrics, in the previous post, I included this example of one of my compositions which has ‘lyrics’, using a tool I coded to make them appear gradually:
To get that to work, I had to essentially rewrite all the lyrics in Unity and assign timestamps for each line for when it should appear. It was a lot of tedious work!
I knew though that the lyrics I attached to notes in Sibelius were stored in exported midi data - the silly robot voice gets generated from that data - and wondered whether I could figure out how to extract that information with timestamps to automate and improve the gradually-appearing lyrics.
So I asked ChatGPT, and with its help I was able to make something that does just that! There seem to be a couple of midi-parsing add-ons for Unity, though I bypassed them all entirely and wrote a bare-bones, lightweight thing that just parses a midi file as raw data and extracts only the lyrics; it’s just a single, compact class, which is perfect for me since I love elegant code that does exactly what I want it to do and no more (rather than things bloated with features I’ll never need to target as many use cases as possible).
It was the first time I’d used ChatGPT for something non-trivial (as far as I can recall with my poor memory, anyway), and it was… interesting. I’ve read a lot of stories about people who knew zero code and managed to make a full app with just prompts, but if you already know what you’re doing, it’s clear that a lot of what it comes up with is far from perfect, and a lot of time needs to be spent fixing issues with its solutions. But it got me like 90% of the way there and introduced me to things I’d never otherwise have been aware of (like the exact byte-based configuration of a midi file’s header). So it felt like it unlocked a lot of exciting doors.
Thrilled by that, I tried to tackle another frustration I’d prefer to be simplified: recording specific regions of Unity’s game area as videos with - again - a minimalistic, zero-bloat, single-class component rather than relying on external video recorders like OBS as I had been doing previously.
So I - or I suppose I should say we - got that working too! Sort of. It’s very specific to this exact project and solution, and goes about it in a sort of hacky, less-than-ideal way, but it can at least produce results, and it’s something I could build on. Plus I learned a lot while making it.
Here are a couple of examples of what I can produce with the two things I made:
All I need to provide to Unity now are the midi file and a .wav file of the music; no more tediously typing out all the lyrics and figuring out timestamps for when they should appear. Now the lyrics appear syllable-by-syllable, too, rather than a line at a time.
The tall, narrow dimensions are because it’s capturing the right-hand side of the same thing shown in the earlier version of Spryad’s Theme (and omitting the sheet music), though I like that it’s something I could potentially send via mobile to people who might be put off by the music notation. The files are tiny too - just over 1MB per minute - which I greatly like!
Working on that got my mind buzzing with ideas for how I could make use of this for other things. Like the ‘talky thing’ I’ve repeatedly written about: previous versions of that divide scenes into dialogue lines that trigger when the previous one has finished. But maybe instead I could tie dialogue lines to music notes in Sibelius and more effectively have characters sing-talk to one another? It’s something I’m keen to play around with, which I find exciting from a creative point of view.
(I have a bunch of more recent songs that I considered sharing instead, but… ehhh.)
Learning to wrangle this digital larval deity to achieve goals seems like a particularly useful skill to practise; probably more than most, these days. I worry a lot about how much I don’t know about things like basic survival - due to poor (or non-existent) parenting - so I can imagine using it for that, to finally crawl out of the pit I’m in at long last. I’m excited about the possibilities.
Next, I’ll use it similarly to figure out Laravel, the PHP framework it suggested I should learn to rewrite my website (and personal productivity tool thing). If this went as well as it did, I can’t imagine that being too difficult!