AI Is Terrible at Writing Alt Text
Today is Global Accessibility Awareness Day (GAAD), a time to celebrate and explore the many ways users and publishers can make the Internet more accessible to people with disabilities.
Image alt text is a key element of web accessibility. About 7 million people in the United States have a visual disability, which can make it challenging to navigate an Internet that has become increasingly driven by graphics and videos. Many of these users access websites via screen readers, special software programs which transform webpages into audio, allowing visually impaired users to navigate and interact with them.
Screen reader technology has improved dramatically as the Internet has matured, and many modern web standards increase usability dramatically for visually impaired website visitors. But in many cases, screen readers still have a literal blind spot: images.
Many companies have turned to Artificial Intelligence in order to automatically add alt text to images. That sounds like a good idea, but in practice, it rarely works well.
Making images accessible is a major challenge, but an incredibly important one for photographers, image licensors, publishers, and web developers alike. The main technology for increasing images’ accessibility is alt text. Alt text is written text which can be attached to an image online, and often describes the visual contents of the image. It can be embedded into the metadata of the image itself, embedded into the webpage in which the image is displayed, or both.
When screen reader software finds an image that has alt text, it can read the alt text aloud, which gives the user a sense of what the image depicts, even if they can’t see its visual content. Most web platforms allow designers to add alt text to their images, and social media platforms increasingly support alt text, too.
Alt text is a good solution from a technical perspective. But in practice, alt text has to be well-written in order to be informative and useful. Writing alt text well is more challenging than you might expect — especially for images that depict people, events, or complex topics. Take, for example, the image at the top of this article. How would you describe it? Maybe you would say “A smiling African-American woman wearing purple lipstick and a purple scarf.” At first glance, that seems like a pretty good description of the image.
But pause for a moment and think about each part. You’ve presumably never met the person in the image. How can you know that the person is African-American? Maybe the photo was taken outside the United States. Maybe they have no connection to America at all. And for that matter, how do you know that the person depicted is a woman? You can’t look at a person and know their race or gender. Maybe the person in the image identifies as having a non-binary gender, or another gender identity.
Given how many images there are online — and the pressing challenges of accessing the web for visually impaired users — many companies have turned to Artificial Intelligence in order to automatically add alt text to images. That sounds like a good idea, but in practice, it rarely works well. If a human struggles to accurately describe an image like the one at the top of this article, then a machine will almost certainly fare worse. Sometimes much worse — AI is notoriously biased, and these biases can alter the alt text that computers generate. Sometimes the results are offensive, but more often they’re inaccurate, or just overly generic.
You can see an example of this if you use PowerPoint or another Microsoft product, as Microsoft was early to the AI alt text party. Add an image to your PowerPoint presentation, and then right-click on it and select “Edit Alt Text.” PowerPoint will pre-populate the image’s alt text field with an automatically generated description. For the image at the top of this article, it generates the text: “A picture containing person, outdoor, grass.”
On the one hand, that’s somewhat useful. The image does indeed include those attributes, and PowerPoint is smart to avoid making assumptions about the person’s race or gender. But a lot is missing from that description. The image’s emotional content, for example, is totally absent. The image shows joy, or at least excitement. That’s totally missing from the AI alt text. So, too, is any description of the person’s clothing or fashion choices, which appear to be a critical element of this specific image. AI-generated alt text can give a vague sense of an image’s content, but the deeper elements of its composition and meaning are often lost to AI algorithms.
In response to this, companies are taking two different approaches. One approach is to improve AI alt text, often by involving people in its creation. That’s the strategy applied by companies like CloudSight. The company is attempting to train AI to actually understand the content of an image and generate accurate descriptions based on that understanding, not to simply tag objects that it sees in the image. The company also specializes in hybrid recognition, which combines humans and machines for more accurate descriptions. I’ve written extensively about CloudSight before.
The AI That Learned To Talk Like a Politician
A case study on achieving a balance between specificity and accuracy in NN training
Use AI to Write Captions for Images with Cloudsight + Python
A Pythonic API lets you automatically write human-readable captions for your images
Another approach is that employed by Scribely, a company that specializes in super-high-quality, human-written alt text. Scribely’s founder Caroline Desrosiers told me that the company is organized as a “tribe” of specialized writers who band together to offer services to brands, photographers, artists, and other users who need well-researched, carefully written alt text for their content. Scribely specializes in generating alt text which captures the content and feel of images, making them come alive — even when described verbally in text read aloud through a screen reader.
For an Instagram image of the company’s founder, for example, the Scribely team wrote the alt text description: “Scribely CEO Caroline Desrosiers lounges in a wooden Adirondack chair with a cup of coffee and her laptop. A thermos rests next to her on top of a tree stump. Behind her, the lush and overgrown foliage of Sonoma Valley.” That’s a hell of a lot richer than alt text written by a machine, and generates a much clearer mental picture of what was depicted in the original image.
In fact, try this experiment. Read that description again, and be mindful of the mental picture it conjures up for you. Then, go to the original Instagram post and take a look at the real image. It’s a good bet that your mental picture and the actual image had a lot in common — or at least far more than you’d get from an AI description like “person, chair, trees.”
Now imagine that for every image you encounter online, you’re only able to have a textual description; that’s what the actual experience of surfing the web is like for the millions of Americans who use screen readers. Wouldn’t you much prefer to have a detailed, human-written description than those generated by a computer that randomly labels objects without understanding the image’s context and emotional feel?
Writing human-generated alt text, of course, is expensive — at least relative to the cheap step of running an image through an AI program, which usually costs fractions of a penny. Especially for publishers and other companies who sell content, though, good alt text can actually be a source of major profits, not just a cost. Most search engines view web pages through a system that is similar to a screen reader. Like a blind user, search engine bots can’t see images — they have to rely on textual descriptions, either those in the image’s alt text or descriptions generated by their own AI.
Feeding a search engine a well-written piece of alt text gives it a much better sense of what’s present visually on a webpage, and can lead to higher rankings and more earnings from a company’s content. In their list of Search Engine Optimization best practices, Google includes alt text and tells developers to “focus on creating useful, information-rich content that uses keywords appropriately and is in context of the content of the page” when writing it. Desrosiers told me that many of her clients come to Scribely primarily to improve their search rankings, and the accessibility benefits become an added bonus. According to Desrosiers, improving accessibility is still Scribely’s primary reason for existing, but she loves the fact that alt text aligns accessibility and profits in such a positive way.
Want to learn more about alt text, accessibility, and what it’s like to use a screen reader? My industry’s nonprofit trade group the Digital Media Licensing Association held a webinar on 5/21/21 on Image Accessibility and SEO featuring Desrosiers, David Riecks of ControlledVocabulary.com (who helped to formulate the IPTC standards for alt text), screen reader user and accessibility consultant Natalie Trevonne, and Mark Milstein of Microstocksolutions.
The DMLA is releasing the recording of the webinar free to the public on 6/20 — check back here to see it when it’s released. You can also check out alt text resources from W3 or SEO company Moz if you’re primarily focused on SEO, and you can run a free audit of your own website’s accessibility using Google’s Lighthouse tool.
Especially if you work in the content industry, take a moment this Global Accessibility Awareness Day to develop a deeper understanding of alt text, and then apply what you learn at your own company, doing your part to make the web more accessible.
Even if you’re a user and not a platform creator, you can help too. When uploading content to a social media platform like Instagram — or a website like Yelp or Google Local — take a moment to write a full description of your images whenever the platform gives you the chance. You know your images better than anyone else — by writing your own description, you’re often able to prevent the platform from relying only on AI to describe your image’s content, and that can do wonders to make your content more accessible for those with visual impairments.