In the previous blogpost about AI keywording services, there was one service, which we did not discuss. An elephant in the room, one might say. Using AI (ChatGPT/Claude) directly. Or, even better, building a whole “AI keywording service” yourself.
It took me way more time to write this blogpost rather than build the thing. It was that fast! The result works with multiple images at the same time and exports a CSV file that you can import to microstocks. Voilà!
TL;DR;
If you’re here just to see the tool, click here. Note, that I made small style changes for it to fit into Xpiks website. Everybody else, please read on.
Building the thing
Disclaimer about time
It took me less than 10 minutes because I happened to have some experience in software engineering. Claude made a few mistakes and while it has fixed them fully on its own eventually, sailing wouldn’t be as smooth if I was a complete newbie.
Another disclaimer is that I had an OpenAI API key already so if you’re new to this, it might take a couple of minutes more just to register.
On being lazy
If you’re using ChatGPT, it gives you the source code of whatever you’re building, but it’s not interactive. To test it, you need to copy the code back and forth to where you can try it. This is something I did not want to bother with.
However, there was an exciting development recently, called Claude Artifacts. When you generate a piece of software, Claude (the company behind models such as Claude Sonnet 3.5) can host the code for you in some form that allows its execution. At least that is valid for simple web pages.
That was enough of a reason to use Claude for this task, even though I did not have a paid account with them. Everything below was created using a free Claude account.
Prompting and testing
I signed up with Claude, confirmed my email and I was ready to start prompting. Here’s the first prompt I used:
You’re an expert web programmer. I would like you to create a web page as an artifact that allows to select a local image and this image will be sent to ChatGPT API. The API will return me a Title, Description and Keywords (comma-separated list) that will be good for microstock photography websites. Description should be below 200 characters and Title should be below 70. Website artifact that you will create should allow to customize the prompt sent to OpenAI and also to set my own API Key
Yes, I realize it’s cruel to make Claude AI create a page to use ChatGPT (its arch-competitor) but I did not have Claude API Key and I did have an OpenAI API Key so this was a necessary sacrifice to make this work.
Claude made 2 mistakes:
- It used the now defunct
gpt-4-vision-preview
model - It messed something up with the image encoding in a request to OpenAI.
I did not want to read the code, so just asked it to “fix it”, literally:
After this, the webpage was fully functional, and, although it has an obvious problem with the right margin and text overflow, it does work!
There were two more important things missing:
- I wanted to be able to upload many images at once
- I needed a CSV file in the end, not a list of text with metadata
So this is pretty much what I asked it to do, one thing at a time.
While I was testing this, I figured that I also want to be able to stop execution, if I realize a mistake in the prompt. So additionally I asked Claude to add a “Stop” button. It complied.
The final page looked a bit off in terms of styles, but it was tiresome to ask to fix the small things as Claude was regenerating the whole thing again and again. So I downloaded the file and made a couple of edits myself:
- fixed webpage style to make those margins behave
- added a spinner to spin while metadata is generating
- disabled buttons after clicking on them
Limitations
There are a couple of things to keep in mind:
- whole files are always sent to OpenAI, so it’s not the most efficient way (especially, if you have many files)
- files are sent one by one so it can take more time to get metadata (OpenAI has API rate limits anyway so sending everything in parallel is also not possible)
- only photos are supported at the moment, videos and vectors are not
- metadata is not written back to files, only a CSV file is generated. So it’s best to use software like Xpiks to import it back
- there’s a limited number of keywords generated and they are not the best, because LLMs actually don’t produce good keywords (Xpiks uses other machine learning models for that)
Integrating into this website
While this separate page was working perfectly, to make anybody use it, I wanted to integrate it into this website. So I had to actually work manually a bit more to align the styles of buttons and text fields.
Except for the FAQ items that I added later, it’s the original tool, created by Claude. It may look bare-bones, but it has everything to get you started!
Evaluating
Metadata
If you’ve read the previous blogpost, you know that I was evaluating AI keywording services on my best-seller picture:
So obviously, for comparison, we also need to use our shiny new tool by Claude. Here’s the metadata it generated:
Field | ChatGPT + Claude | The Original |
---|---|---|
Title | Climbers Conquering a Snow-Capped Mountain Ridge | Tied climbers climbing mountain with snow field tied with a rope with ice axes and helmets |
Description | A group of climbers traverses a snow-covered ridge against a clear blue sky. They are roped together, showcasing teamwork in a challenging alpine environment. | Tied climbers climbing mountain with snow field tied with a rope with ice axes and helmets |
As for the actual keywords, here’s the list: mountaineering
, snow
, climbers
, teamwork
, alpine
, adventure
, outdoor sports
, rugged terrain
, winter
, hiking
, summit
, peak
, rope
, landscape
, cold weather
. Not many “money keywords” included, but this is a super rough sketch and with some prompt engineering you can get better results.
No fuss, but by tweaking the prompt, I’m sure it will become on par with other services (maybe sans the keywords).
Price
The biggest advantage here is that having an API Key is way cheaper than subscribing to ChatGPT Pro. OpenAI has a couple of options that affect pricing, such as resolution of the patches that are used to “survey” an image, but the resulting price should be around 3-5 times cheaper than any solution mentioned in the previous blogpost (especially Pixify). The reason to pay more is, of course, the convenience.
What’s next
Here’s the actual code that Claude has generated, with a few cosmetic improvements. You can just download the file to your computer and open it in any browser. It can be taken quite far by asking for more edits and improvements from Claude and in one evening you can have your own bare-bones PhotoTag or VisualMind clone.
Although it works just fine, some corners were cut (see Limitations). If you wish to use a more professional tool, check out auto-keywording plugin for Xpiks.