The problem is what birds look like is very well defined compared to text. As in, there are many types of birds, but each type has a "consistent image" but there are infinite variations of text
Fair enough. You could certainly still do it with Mechanical Turk then, although the delay in response might not meet the requester's unstated requirements.