Synthesia is a game for piano, and the Guitar-Hero-style falling notes is a popular alternative to sheet music in the YouTube era, with plenty of tutorial videos using this format (for example).
My piano teacher thinks that making these videos myself is useful – I can visually see, e.g. what chord patterns I'm playing, and how that compares to what the better pianists on YouTube do. On the download page, hidden at the bottom, is an app that, I am told, converts MIDI files to video. But it's only available for 32-bit Vista and MacOS.
So, this leaves me, an Ubuntu user, with two tasks: recording a MIDI file, then converting the MIDI to video.
My digital piano is a Casio CDP-120, which has a USB port that I can connect to my laptop. I see people on the Internet writing that they successfully got their keyboards to talk to whichever software packages (Ardour is a popular choice), but I had no luck with that, and resorted to using the command line.
First, run arecordmidi -l
to see if it recognises the keyboard, and, if so, which port it is connected to:
Port Client name Port name
14:0 Midi Through Midi Through Port-0
20:0 CASIO USB-MIDI CASIO USB-MIDI MIDI 1
My keyboard is connected to port 20. To record, run
arecordmidi --port 20 output_file.mid
It will continue recording until you use ctrl-C to kill the process. I have a(n occasionally inelegant) shell script that automates finding the port and choosing a file name (audio001.mid, audio002.mid, ...), and which I run in terminal.
#!/bin/bash
# Assumes that the input MIDI device has a name containing 'CASIO';
# edit the second-last line as appropriate, or set the port directly
# in the final line.
LIST=`exec ls audio*.mid | sed 's/audio\([0-9]\+\).*/\1/g' | tr '\n' ' '`
if [ -z "$LIST" ]
then
next_file_03d="audio001.mid"
else
arr=($LIST)
last_file_i=$((10#${#arr[@]}-1))
last_file=${arr[$last_file_i]}
next_file=$((10#$last_file+1))
next_file_03d=`printf audio%03d.mid $((next_file))`
fi
PORT=`arecordmidi -l | grep CASIO | sed 's/\:.*//' | sed 's/ //'`
arecordmidi --port "$PORT" $next_file_03d
I wrote a Python script to parse the MIDI file, which I assume to only have one track (the piano). It uses the mido library
to import the MIDI file, and Pillow to output individual frames of the video to PNG. The end of the script makes two calls to
external programs: one to Timidity (sudo apt-get install timidity
) which converts the MIDI file to
WAV, and one to FFmpeg to put it all together into a video.
It took a few minutes to generate this video, a little over a minute long:
I am better at the scripting than at the piano.
import mido
import PIL
import numpy as np
import os
import subprocess
input_midi = "haunted.mid"
frames_folder = "single_frames"
image_width = 1280
image_height = 720
piano_height = round(image_height/6)
black_key_height = 2/3
pressed_key_colour = [200, 128, 128]
# Speed in main-image-heights per second
vertical_speed = 1/4
fps = 25
main_height = image_height - piano_height
time_per_pixel = 1/(main_height*vertical_speed)
pixels_per_frame = main_height*vertical_speed / fps
# Only used in the print-out of the notes; not relevant to the video:
accidentals = "flat"
white_notes = {0: "C", 2: "D", 4: "E", 5: "F", 7: "G", 9: "A", 11: "B"}
sharp_notes = {1: "C#", 3: "D#", 6: "F#", 8: "G#", 10: "A#"}
flat_notes = {1: "Bb", 3: "Eb", 6: "Gb", 8: "Ab", 10: "Bb"}
white_notes_scale = {0: 0, 2: 1, 4: 2, 5: 3, 7: 4, 9: 5, 11: 6}
note_names = {}
def note_breakdown(midi_note):
note_in_chromatic_scale = midi_note % 12
octave = round((midi_note - note_in_chromatic_scale) / 12 - 1)
return [note_in_chromatic_scale, octave]
for note in range(21, 109):
[note_in_chromatic_scale, octave] = note_breakdown(note)
if note_in_chromatic_scale in white_notes:
note_names[note] = "{}{:d}".format(
white_notes[note_in_chromatic_scale], octave)
else:
if accidentals == "flat":
note_names[note] = "{}{:d}".format(
flat_notes[note_in_chromatic_scale], octave)
else:
note_names[note] = "{}{:d}".format(
sharp_notes[note_in_chromatic_scale], octave)
def is_white_key(note):
return (note % 12) in white_notes
input_file = mido.MidiFile(input_midi)
track = input_file.tracks[0]
ticks_per_beat = input_file.ticks_per_beat
# The 'notes' list will store each note played, with start and end
# times in seconds.
notes = []
notes_on = 0
# The MIDI file comprises a number of messages. The time given in
# a message is the time since the previous message, and is in units
# of ticks.
current_t = 0
for msg in track:
if msg.type == "note_on":
notes.append({"note": msg.note,
"start": (current_t + msg.time)/(2*ticks_per_beat),
"end": 0})
notes_on += 1
elif msg.type == "note_off":
# Loop backwards to find which note just ended:
for i in range(notes_on - 1, -1, -1):
if notes[i]["note"] == msg.note:
notes[i]["end"] = (current_t + msg.time)/(2*ticks_per_beat)
break
current_t += msg.time
# Print-out of the notes, to check that the file has been parsed
# correctly:
for note in notes:
print("Note = {}, start = {:.2f}, duration = {:.2f}".format(
note_names[note["note"]],
note["start"],
note["end"] - note["start"]))
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~ The rest of the code is about making the video. ~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
def pixel_range(midi_note, image_width):
# Returns the min and max x-values for a piano key, in pixels.
width_per_white_key = image_width / 52
if is_white_key(midi_note):
[in_scale, octave] = note_breakdown(midi_note)
offset = 0
width = 1
else:
[in_scale, octave] = note_breakdown(midi_note - 1)
offset = 0.5
width = 0.5
white_note_n = white_notes_scale[in_scale] + 7*octave - 5
start_pixel = round(width_per_white_key*(white_note_n + offset)) + 1
end_pixel = round(width_per_white_key*(white_note_n + 1 + offset)) - 1
if width != 1:
mid_pixel = round(0.5*(start_pixel + end_pixel))
half_pixel_width = 0.5*width_per_white_key
half_pixel_width *= width
start_pixel = round(mid_pixel - half_pixel_width)
end_pixel = round(mid_pixel + half_pixel_width)
return [start_pixel, end_pixel]
if not os.path.isdir(frames_folder):
os.mkdir(frames_folder)
# Delete all previous image frames:
for f in os.listdir(frames_folder):
os.remove("{}/{}".format(frames_folder, f))
im_base = np.zeros((image_height, image_width, 3), dtype=np.uint8)
# Draw the piano, and the grey lines next to the C's for the main area:
key_start = image_height - piano_height
white_key_end = image_height - 1
black_key_end = round(image_height - (1-black_key_height)*piano_height)
im_lines = im_base.copy()
for i in range(21, 109):
if is_white_key(i):
[x0, x1] = pixel_range(i, image_width)
im_base[key_start:white_key_end, x0:x1] = [255, 255, 255]
if i % 12 == 0:
# C
im_lines[0:(key_start-1), (x0-2):(x0-1)] = [80, 80, 80]
for i in range(21, 109):
if not is_white_key(i):
[x0, x1] = pixel_range(i, image_width)
im_base[key_start:black_key_end, x0:x1] = [0, 0, 0]
im_piano = im_base[key_start:white_key_end, :]
im_frame = im_base.copy()
im_frame += im_lines
# Timidity (the old version that I have!) always starts the audio
# at time = 0. Add a second of silence to the start, and also
# keep making frames for a second at the end:
frame_start = notes[0]["start"] - 1
end_t = max(note["end"] for note in notes) + 1
# First frame:
for j in range(main_height):
im_j = main_height - j - 1
t = frame_start + time_per_pixel*j
for note in notes:
if note["start"] <= t <= note["end"]:
[x0, x1] = pixel_range(note["note"], image_width)
im_frame[im_j, x0:x1] = [255, 0, 0]
img = PIL.Image.fromarray(im_frame)
img.save("{}/frame00000.png".format(frames_folder))
# Rest of video:
finished = False
frame_ct = 0
pixel_start = 0
pixel_start_rounded = 0
print("Starting images")
while not finished:
frame_ct += 1
if frame_ct % 100 == 0:
print(frame_ct)
prev_pixel_start_rounded = pixel_start_rounded
pixel_start += pixels_per_frame
pixel_start_rounded = round(pixel_start)
pixel_increment = pixel_start_rounded - prev_pixel_start_rounded
frame_start += 1/fps
# Copy most of the previous frame into the new frame:
im_frame[pixel_increment:main_height, :] = im_frame[0:(main_height - pixel_increment), :]
im_frame[0:pixel_increment, :] = im_lines[0:pixel_increment, :]
im_frame[key_start:white_key_end, :] = im_piano
# Which keys need to be coloured?
keys_to_colour = []
for note in notes:
if note["start"] <= frame_start <= note["end"]:
keys_to_colour.append(note["note"])
# Draw the new pixels at the top of the frame:
for j in range(pixel_increment):
t = frame_start + time_per_pixel*(main_height - j - 1)
for note in notes:
if note["start"] <= t <= note["end"]:
[x0, x1] = pixel_range(note["note"], image_width)
im_frame[j, x0:x1] = [255, 0, 0]
# First colour the white keys (this will cover some black-key pixels),
# then re-draw the black keys either side,
# then colour the black keys.
for note in keys_to_colour:
if is_white_key(note):
[x0, x1] = pixel_range(note, image_width)
im_frame[key_start:white_key_end, x0:x1] = pressed_key_colour
for note in keys_to_colour:
if is_white_key(note):
if (not is_white_key(note - 1)) and (note > 21):
[x0, x1] = pixel_range(note - 1, image_width)
im_frame[key_start:black_key_end, x0:x1] = [0,0,0]
if (not is_white_key(note + 1)) and (note < 108):
[x0, x1] = pixel_range(note + 1, image_width)
im_frame[key_start:black_key_end, x0:x1] = [0,0,0]
for note in keys_to_colour:
if not is_white_key(note):
[x0, x1] = pixel_range(note, image_width)
im_frame[key_start:black_key_end, x0:x1] = pressed_key_colour
img = PIL.Image.fromarray(im_frame)
img.save("{}/frame{:05d}.png".format(frames_folder, frame_ct))
if frame_start >= end_t:
finished = True
print("Calling Timidity")
subprocess.call("timidity {} -Ow --output-24bit -A120 -o output.wav".format(input_midi).split())
print("Calling ffmpeg")
# Running from a terminal, the long filter_complex argument needs to
# be in double-quotes, but the list form of subprocess.call requires
# _not_ double-quoting.
ffmpeg_cmd = "ffmpeg -framerate 25 -i {}/frame%05d.png -i output.wav -f lavfi -t 0.1 -i anullsrc -filter_complex [1]adelay=1000|1000[aud];[2][aud]amix -c:v libx264 -vf fps=25 -pix_fmt yuv420p -y out.mp4".format(frames_folder)
subprocess.call(ffmpeg_cmd.split())
Posted 2019-03-10.