LLM image generators can make a picture of anything you ask for. The results often look pretty good at first glance. They're generic and the details are usually off, but folks overlook that easily.
This hides something important: these models can't see like we do. The main limitation is how they're trained. We show them millions of pictures, paired with text descriptions.
The problem is, humans don't describe images literally. We might say "a picture of a dog playing frisbee" but we didn't mention the setting, the composition, or the squirrel in the background.
Most of what's there visually is unsaid. The model sees those pixels, but they're just "stuff that goes along" with the text. Dogs play in parks, so the AI learns that dogs have green backgrounds.
This is why it's so hard to control an image generator. It isn't intentionally placing all those objects and choosing their attributes, it's just extra fluff that seems to "go with" what you asked for.
(2/3)
#science #ai #generativeart