Photo by Mateusz Wacławek on Unsplash
I’ve heard ‘move fast and break things’ for over a decade. This phrase is a pseudo-rallying call in the tech community. Creating digital processes, systems, products and platforms quickly and figuring out that they don’t work has become a badge of honor. The prevailing standard operating theory is that if we know what doesn’t work, then we’re closer to determining what does work and all the breaking we’ve done is somehow worth it.
Nowadays, any task that would repeat more than twice is a strong candidate for being automated and scaled using some combination of artificial intelligence (AI) approaches — be it machine learning, deep learning, natural language process and/or facial recognition processes. Organizations slap trendy labels on these products like AI-inspired, AI-infused or AI-powered to imply a comprehensive due diligence of software verification and validation routines had been done. So we’re all being conditioned to trust AI hook, line and sinker.
The ‘move fast and break things’ mantra has never resonated with me. I always wondered: what happens to the things, aka the digital processes, systems, products and platforms, that were broken? And more directly, I wondered: what were the ripple effects of those broken things on the rest of the digital infrastructure? These questions and others like them were whispered in conference hallways, over a meal or after team meetings. No one was talking openly about the broken things.
Artificial intelligence has done some pretty gobsmacking things over the years that has weakened the resolve ‘move fast and break things’. Let me walkthrough three examples: Tay the Twitter bot, Generative Pre-trained Transformer 3, and Meta AI’s Galactica.
Tay the Twitter bot
Tay (@TayandYou) debuted in March 2016 with the intention of being a conversational AI chatbot. Tay was to engage people on Twitter through everyday conversations and get better at real-life human dialogue. But soon after its release, the conversation went from mundane and casual to racist and misogynistic remarks. Tay was shut down after about 24 hours. Its tweets are now under protected status, which means only approved followers can see them. And Tay now has 0 followers.
The biggest lesson learned from Tay, and its successor Zo, was that designing digital communication systems come with more than computing problems, but social ones. Well, if the Tay AI chatbot design team did their due diligence by engaging with social scientists, they would’ve known that. Social scientist could’ve given context to people’s language formation, interactions and reactions. Tay was designed to mimic those who interacted with it — not to discern or evaluate the decency of these interactions. Common sense didn’t play a role. The Tay AI chatbot design team moved fast and broke things unnecessarily while not taking meaningful responsibility for Tay’s vile comments.
Generative Pre-trained Transformer 3
Known as GPT-3, it was released in June 2020 with much excitement and fanfare in the AI/tech community. GPT-3 was built to auto-create human-like text using a ton of meticulously-modeled data. It took 800 GB of storage (that’s equivalent to 400,000 photos) and 1.75 billion parameters in their large language model to generate what the GPT-3 designers deemed as useful text. In short order, GPT-3 started to spew toxic language that perpetuated hate speech, online harassment and so on. GPT-3 was found to be great providing elementary facts, but struggled to reason beyond what was statistically sound.
So GPT-3 lacked the ability to provide nuance. It’s no shock that semantic understanding proved to be a challenge the GPT-3 design team was unable to overcome even with 800 GB of storage. The original research paper even lists several known limitations of GPT-3 that are structural and algorithmic in nature: reliability, interpretability, accessibility and speed to name a few. Also, GPT-3 breathed life into the equally-troubling idea of the GitHub CoPilot that’s currently battling its own copyright class-action lawsuit.
Once again, common sense has eluded tech-centric designers. GPT-3 was supposed to be the answer to making natural language processing faster, better, easier and require minimal human intervention. It’s led to more people trying to solve nuanced social realities that required situational awareness with algorithmic-based approach that can’t decide nuance or understand situational awareness.
Meta AI’s Galactica
Galactica emerged on November 15, 2022. The Galactica design set out to automatically produce new scientific knowledge by digitize 48 million science papers from around the globe. The Galactica system would then be able to ”store, combine and reason about scientific knowledge.” The basic underpinning of their work sprouts from a revised version of large language models, which were also used for GPT-3.
And as you’ve already guessed, Galactica generated gibberish “science” papers rooted in racism, sexism, homophobia and all the -isms. It also created other health-harmful misinformation-based studies like the benefits of eating crushed glass. Researchers became very vocal on Twitter citing countless examples of the inappropriateness of Galactica’s outputs. Within 2 days, Galactica was shut down.
So many red flags are clear from the onset on this organizing science experiment. There’s a consistent history of AI team trying to digitize our language to glorious failure. Yet, they keep approaching what they deem as problems the same way — strictly through the computational lens. They’ve generated algorithmic waste, environmental residue from the electronic/compute power and misinformation proliferation. This stubbornness is intentional with high financially costs and doesn’t follow the outcome of the ‘move fast and break things’ philosophy, which is to change course based on new lessons they learned from the previous applications. Engage and listen to data professionals who are social scientists, AI ethicists and responsible AI practitioners.
The moral of the story: common sense over AI
AI’s impact to people are more than what can be expressed with data points.
Raising concerns around the ethics of the organization’s data uses — the social, economic and historical structures — quiets a room REAL QUICK. The computer scientists, statisticians and mathematical *tend* to be more comfortable explaining a regression model or random forest algorithm than talking about the impact of that model or algorithm on people’s lives. Here’s a suggestion on how to elevate the humanizing DataOps perspective and prioritize acting with common sense rather than making a ill-conceived AI product:
1. So don’t say the word ‘ethics’ initially.
2. Discuss the best case outcomes for whatever data science-y work the team is discussing.
3. Discuss the worst case outcomes for whatever data science-y work the team is discussing.
4. Run the development side experiments (regardless if you *think* you know the outcomes).
5. Group the outcomes in three buckets: best case, worst case and unsure. And yeah, I’m sure the team will create an elementary script to automate this process. Let them do that.
6. Run the data analysis and visualize it on a dashboard. That’s what data professionals like.
7. Document the “patterns” and “insights” based on the data from their experiments. Oh yeah, this process is very systematic/methodical and designed for the analytically-minded.
8. Repeat Steps 2 and 3. (Expect to do this step more than once. Many people don’t want to admit that their reasoning doesn’t align to practical, real-life situations.)
9. Maybe then the team will at least acknowledge that their experiments were incomplete.
10. Now you have the floor to discuss pathways to responsible AI and where the current strictly analytical process is insufficient to cover best to worse possible outcomes.
Replace ‘move fast and break things’ with ‘move slower and build better’.