PenPal

July 17, 2022

PenPal is an AI writing assistant with a twist: it reads and writes on paper with a pen just like you do. It took some gaussian neural network magic to get it to write realistically, but it was well worth it. Once piped together with GPT-3 and some @googlecloud OCR the result is an end to end conversational agent via physical handwriting.

It all started with messing around on the shop floor at ITP. A week prior, I had given a presentation on the power of GPT-3. We were all invited to share our expertise with the group and given my experience at OpenAI I thought it was the least I could do. Color me surprised when I realized most of the artists had little to no interest in the technology after the presentation. They thought it was explained, they just didn't get the hype. To them, it was just a fancy autocomplete engine. I showed them samples of conversations but it wasn't clicking. That was, until I found an old Axi-Draw on the shop floor. It needed some fixing up, but once it was working I started playing with it. It drew beautifullly, and I began to wonder... could this thing write? I started by experimenting with fonts, but even with nice single stroke fonts the handwriting looked robotic and unnatural. I needed something more. I needed to get the computer to learn how to write like a human does.

\Some of my earliest research in deep learning and neural networks came from Alex Graves' seminal work on synthesizing sequences with recurrent neural networks. He demonstrated 2 hallmark applications in his paper: handwriting synthesis and speech transcription. I had great success with the latter, but never tried the former. Nonetheless, I knew that if I could get the Axi-Draw to draw, I could get it to write.

I discovered Sean Vasquez's incredible existing repository demonstrating handwriting synthesis with LSTMs using Gaussian Mixture Models and got started tying all the pieces together. A day later I had a simple script running that could convert any arbitrary text into a sequence of pen strokes. I was blown away. I had a machine that could write in a number of styles, with punctuation and natural pauses.

Soon after that, the whole setup was quite clear in my head. In order for people to chat with it, we needed to design a setup that afforded an interaction, not just this capability. I recruited 2 friends to join me, and we devised a simple setup with a laser cut wooden assembly, some nice paper cards, a webcam and a laptop. We called it PenPal.

At the final demos event, people were blown away. They were chatting with a machine, and it was writing back to them. It was a magical experience. We had a line of people waiting to chat with it, and it was a great success. People loved taking the pieces of paper home with them as souvenirs, and some folks even left in tears. Feel free to try setting it up yourself if you have an Axi-Draw and a spare webcam. The code is all open source: Github Repo