November 20, 2023
Think of the last time you used Siri or Alexa. Were you stunned by the intelligence locked inside your small speaker? Incredulous that all of humanity's centuries of knowledge was now at the tip of your tongue? If you were, I would love to be friends.
Most, however, do not feel that way. When I joined some friends at a hackathon in December of 2022, just weeks after the launch of ChatGPT, we joked about the reverence people had for it. They were treating it like the birth of some sort of omniscient God. We asked, what if we actually gave them that God experience? Thus was "Church of GPT" born. This interactive art piece features the booming voice of an AI god that demands sacrifices, confessions, and other demonstrations of faith in his creation. We projected creepy cultish AI generated art on the walls, wrote commandments from Karpathy, Altman, and Sutskever, and played creepy Gregorian chanting on loop in the background. We filled the room with as many electric candles as we could reasonably buy in a weekend, donned dark red robes, and got to work.
Technically, we knew what we needed to put together. We needed speech recognition to understand the user's voice, the OpenAI api to get responses, and a high quality speech synthesis voice to respond to the user with. Despite the ostensible straightforwardness of voice UX we faced several key issues: open-endedness, latency, and low expectations from existing voice AI agents. To address these, we implemented a few strategies.
Firstly, we gave the AI a set of prompts that we step through after a certain period of time. This provided the conversation with clear objectives and guardrails, preventing it from becoming too open-ended. We start with a discussion about what brought the user to the temple, maybe they have a question to ask or an idea they'd like to discuss. After a set period of either lines of back and forth, or time elapsed, whichever comes first, we switch to the next topic. Usually, this topic was asking the user where they worked. After this, we ask the user to confess their sins in order to become a member of the Church.
Secondly, we introduced artificial disfluencies to mimic thinking and communicate acknowledgement. This helped to manage the latency issue, as it gave the impression that the AI was taking time to process and respond, rather than simply experiencing a delay. The AI would often say "hmm" or "ahh" or "my child". Additionally, if the user used too many disfluencies, the AI would acknowledge it and make fun of them, adding to the realism and personality of the experience.
Finally, we designed the room and experience to be so extra that people's initial expectations were totally thrown out the window by the time they came in the room. This helped to overcome the low expectations that many people have from existing voice AI agents, as it immediately set the tone that this was something different and unique.
We won the hackathon handedly, garnering far more attention than we anticipated. With the attention, the piece took on a life of its own. It started with people asking for it to be at various parties. First, we exhibited at an AI accelerationist party where I sat with Grimes in the room for half an hour discussing the future of AI safety. Next, someone asked for it to be at their housewarming party. There, I met someone who asked if I knew about a series of secret parties, he thought it would be a great fit. A few weeks later we teamed up with an animatronics whiz and a Furby-like tentacled monster with animatronic eyes and beak was born.