Generate and Edit Photos Utilizing OpenAI gpt-image-1 API

April 26, 2025

16

The final time OpenAI’s ChatGPT launched a picture technology mannequin, it shortly went viral throughout the web. Individuals had been captivated by the flexibility to create Ghibli-style portraits of themselves, turning private recollections into animated art work. Now, ChatGPT is taking issues a step additional with a brand new natively multimodal mannequin “gpt-image-1” which powers picture technology immediately inside ChatGPT and is now obtainable by way of API. On this article we are going to discover the important thing options of OpenAI’s gpt-image-1 mannequin and how one can use it for picture technology and enhancing.

What’s gpt-image-1?

gpt-image-1 is the newest and most superior multimodal language mannequin from OpenAI. It stands out for its capability to generate high-quality pictures whereas incorporating real-world information into the visible content material. Whereas gpt-image-1 is beneficial for its sturdy efficiency, the picture API additionally helps different specialised fashions like DALL·E 2 and DALL·E 3.

The Picture API gives three key endpoints, every designed for particular duties:

Generations: Create pictures from scratch utilizing a textual content immediate.
Edits: Modify present pictures utilizing a brand new immediate, both partially or fully.
Variations: Generate variations of an present picture (obtainable with DALL·E 2 solely).

OpenAI gpt-image-1 API - endpoints — Supply: OpenAI

Additionally Learn: Imagen 3 vs DALL-E 3: Which is the Higher Mannequin for Photos?

Key Options of gpt-image-1

gpt-image-1 gives a number of key options:

Excessive-fidelity pictures: Produces detailed and correct visuals.
Various visible kinds: Helps a variety of aesthetics, from photograph practical to summary.
Exact picture enhancing: Allows focused modifications to generated pictures.
Wealthy world information: Understands advanced prompts with contextual accuracy.
Constant textual content rendering: Renders textual content inside pictures reliably.

Availability

The OpenAI API allows customers to generate and edit pictures from textual content prompts utilizing the GPT Picture or DALL·E fashions. At current, picture technology is accessible completely by means of the Picture API, although assist for the Responses API is actively being developed.

To learn extra about gpt-image-1 click on right here.

gpt-image-1 Pricing

Earlier than diving into how one can use and deploy the mannequin, it’s essential to know the pricing to make sure its efficient and budget-conscious utilization.

The gpt-image-1 mannequin is priced per token, with completely different charges for textual content and picture tokens:

Textual content enter tokens (prompts): $5 per 1M tokens
Picture enter tokens (uploaded pictures): $10 per 1M tokens
Picture output tokens (generated pictures): $40 per 1M tokens

In sensible phrases, this roughly equates to:

~$0.02 for a low-quality sq. picture
~$0.07 for a medium-quality sq. picture
~$0.19 for a high-quality sq. picture

For extra detailed pricing by picture high quality and backbone, check with the official pricing web page right here.

OpenAI gpt-image-1 API - image sizes and pricing — Supply: OpenAI

Notice: This mannequin generates pictures by first creating specialised picture tokens. Due to this fact, each latency and general value rely on the variety of tokens used. Bigger picture dimensions and better high quality settings require extra tokens, rising each time and price.

Entry gpt-image-1?

To generate the API key for gpt-image-1:

Register to the OpenAI platform
Go to Venture > API Keys
Confirm your account

For this, first, go to: https://platform.openai.com/settings/group/common. Then, click on on “Confirm Group” to start out the verification course of. It’s quire just like any KYC verification, the place relying on the nation, you’ll be requested to add a photograph ID, after which confirm it with a selfie.

It’s possible you’ll comply with this documentation supplied by Open AI to raised perceive the verification course of.

Additionally Learn: Use DALL-E 3 API for Picture Technology?

gpt-image-1: Fingers-on Software

Lastly it’s time to see how we will generate pictures utilizing the gpt-image-1 API.

We might be utilizing the picture technology endpoint to create pictures primarily based on textual content prompts. By default, the API returns a single picture, however we will set the n parameter to generate a number of pictures directly in a single request.

Earlier than working our primary code, we have to first run the code for set up and establishing the atmosphere.

!pip set up openai
import os
os.environ['OPENAI_API_KEY'] = ""

Producing Photos Utilizing gpt-image-1

Now, let’s strive producing a picture utilizing this new mannequin.

Enter Code:

from openai import OpenAI
import base64
shopper = OpenAI()


immediate = """
A serene, peaceable park scene the place people and pleasant robots are having fun with the
day collectively - some are strolling, others are enjoying video games or sitting on benches
below timber. The environment is heat and harmonious, with delicate daylight filtering
by means of the leaves.
"""


consequence = shopper.pictures.generate(
    mannequin="gpt-image-1",
    immediate=immediate
)


image_base64 = consequence.knowledge[0].b64_json
image_bytes = base64.b64decode(image_base64)


# Save the picture to a file
with open("utter_bliss.png", "wb") as f:
    f.write(image_bytes)

Output:

image generated using OpenAI gpt-image-1 API

Modifying Photos Utilizing gpt-image-1

gpt-image-1 gives numerous picture enhancing choices. The picture edits endpoint lets us:

Edit present pictures
Generate new pictures utilizing different pictures as a reference
Edit elements of a picture by importing a picture and masks indicating which areas must be changed (a course of often known as inpainting)

Modifying an Picture Utilizing a Masks

Let’s strive enhancing a picture utilizing a masks. We’ll add a picture and supply a masks to specify which elements of it must be edited.

The clear areas of the masks might be changed primarily based on the immediate, whereas the colored areas will stay unchanged.

Now, let me ask the mannequin so as to add Elon Musk to my uploaded picture.

Enter Code:

from openai import OpenAI
shopper = OpenAI()


consequence = shopper.pictures.edit(
    mannequin="gpt-image-1",
    picture=open("/content material/analytics_vidhya_1024.png", "rb"),
    masks=open("/content material/mask_alpha_1024.png", "rb"),
    immediate="Elon Musk standing in entrance of Firm Emblem"
)


image_base64 = consequence.knowledge[0].b64_json
image_bytes = base64.b64decode(image_base64)


# Save the picture to a file
with open("Elon_AV.png", "wb") as f:
    f.write(image_bytes)

Output:

Factors to notice whereas enhancing a picture utilizing gpt-image-1:

The picture you need to edit and the corresponding masks have to be in the identical format and dimensions, and every must be lower than 25MB in dimension.
The immediate you give can be utilized to explain your entire new picture, not simply the portion being edited.
In the event you provide a number of enter pictures, the masks might be utilized solely to the primary picture.
The masks picture should embody an alpha channel. In the event you’re utilizing a picture enhancing instrument to create the masks, make sure that it’s saved with an alpha channel enabled.
If in case you have a black-and-white picture, you should utilize a program so as to add an alpha channel and convert it into a sound masks as supplied under:

from PIL import Picture
from io import BytesIO


# 1. Load your black & white masks as a grayscale picture
masks = Picture.open("/content material/analytics_vidhya_masked.jpeg").convert("L")


# 2. Convert it to RGBA so it has house for an alpha channel
mask_rgba = masks.convert("RGBA")


# 3. Then use the masks itself to fill that alpha channel
mask_rgba.putalpha(masks)


# 4. Convert the masks into bytes
buf = BytesIO()
mask_rgba.save(buf, format="PNG")
mask_bytes = buf.getvalue()


# 5. Save the ensuing file
img_path_mask_alpha = "mask_alpha.png"
with open(img_path_mask_alpha, "wb") as f:
    f.write(mask_bytes)

Finest Practices for Utilizing the Mannequin

Listed here are some suggestions and finest practices to comply with whereas utilizing gpt-image-1 for producing or enhancing pictures.

You may customise how your picture seems to be by setting choices like dimension, high quality, file format, compression degree, and whether or not the background is clear or not. These settings allow you to management the ultimate output to match your particular wants.
For sooner outcomes, go together with sq. pictures (1024×1024) and normal high quality. It’s also possible to select portrait (1536×1024) or panorama (1024×1536) codecs. High quality could be set to low, medium, or excessive, and each dimension and high quality default to auto if not specified.
Notice that the Picture API returns the base64-encoded picture knowledge. The default format is png, however we will additionally request it in jpeg or webp.
If you’re utilizing jpeg or webp, then you can even specify the output_compression parameter to manage the compression degree (0-100%). For instance, output_compression=50 will compress the picture by 50%.

Purposes of gpt-image-1

From artistic designing and e-commerce to schooling, enterprise software program, and gaming, gpt-image-1 has a variety of purposes.

Gaming: content material creation, sprite masks, dynamic backgrounds, character technology, idea artwork
Artistic Instruments: art work technology, fashion switch, design prototyping, visible storytelling
Training: visible aids, historic recreations, interactive studying content material, idea visualization
Enterprise Software program: slide visuals, report illustrations, data-to-image technology, branding belongings
Promoting & Advertising and marketing: marketing campaign visuals, social media graphics, localized content material creation
Healthcare: medical illustration, affected person scan visuals, artificial picture knowledge for mannequin coaching
Structure & Actual Property: inside mockups, exterior renderings, structure previews, renovation concepts
Leisure & Media: scene ideas, promotional materials, digital doubles

Limitations of gpt-image-1

The GPT-4o Picture mannequin is a strong and versatile instrument for picture technology, however there are nonetheless just a few limitations to remember:

Latency: Extra advanced prompts can take as much as 2 minutes to course of.
Textual content Rendering: Whereas considerably higher than the DALL·E fashions, the mannequin should face challenges with exact textual content alignment and readability.
Consistency: Though it may well generate visually constant pictures, the mannequin could sometimes battle to keep up uniformity for recurring characters or model parts throughout a number of pictures.
Composition Management: Even with improved instruction-following capabilities, the mannequin could not at all times place parts precisely in structured or layout-sensitive designs.

Mannequin Comparability

Right here’s how OpenAI’s gpt-image-1 compares with the favored DALL·E fashions:

Mannequin	Endpoints	Options
DALL·E 2	Generations, Edits, Variations	Decrease value, helps concurrent requests, consists of inpainting functionality
DALL·E 3	Generations solely	Larger decision and higher picture high quality than DALL·E 2
gpt-image-1	Generations, Edits (Responses API coming quickly)	Glorious instruction-following, detailed edits, real-world consciousness

Conclusion

OpenAI’s gpt-image-1 showcases highly effective picture technology capabilities with assist for creation, enhancing, and variations all coming from easy textual prompts. Whereas the technology of pictures could take a while, the standard and management it gives make it extremely sensible and rewarding general.

Picture technology fashions like this facilitate sooner content material creation, personalization, and sooner prototyping. With built-in customization choices for dimension, high quality, format, and many others. and even inpainting capabilities, gpt-image-1 gives builders full and clear management over the specified output.

Whereas some would possibly fear that this know-how may change human creativity, it’s essential to notice that such instruments intention to reinforce human creativity and be useful instruments for artists. Whereas we must always undoubtedly respect originality, we should additionally embrace the comfort that this know-how brings. We should discover the proper stability the place such instruments assist us innovate with out taking away the worth of genuine, human-made work.

GenAI Intern @ Analytics Vidhya | Ultimate 12 months @ VIT Chennai
Keen about AI and machine studying, I am desirous to dive into roles as an AI/ML Engineer or Information Scientist the place I could make an actual influence. With a knack for fast studying and a love for teamwork, I am excited to deliver progressive options and cutting-edge developments to the desk. My curiosity drives me to discover AI throughout numerous fields and take the initiative to delve into knowledge engineering, guaranteeing I keep forward and ship impactful initiatives.

Generate and Edit Photos Utilizing OpenAI gpt-image-1 API

What’s gpt-image-1?

Key Options of gpt-image-1

Availability

gpt-image-1 Pricing

Entry gpt-image-1?

gpt-image-1: Fingers-on Software

Producing Photos Utilizing gpt-image-1

Modifying Photos Utilizing gpt-image-1

Modifying an Picture Utilizing a Masks

Factors to notice whereas enhancing a picture utilizing gpt-image-1:

Finest Practices for Utilizing the Mannequin

Purposes of gpt-image-1

Limitations of gpt-image-1

Mannequin Comparability

Conclusion

Login to proceed studying and luxuriate in expert-curated content material.

Related Articles

Cisco Providers and Help Demos at Cisco Dwell: A Recap!

Cisco Catalyst 8300 Excels in NetSecOPEN NGFW SD-WAN Safety Exams

ADU 01285: Why does DJI not enable third celebration apps on the RC Professional?

LEAVE A REPLY Cancel reply

Latest Articles

Cisco Providers and Help Demos at Cisco Dwell: A Recap!

Cisco Catalyst 8300 Excels in NetSecOPEN NGFW SD-WAN Safety Exams

ADU 01285: Why does DJI not enable third celebration apps on the RC Professional?

Revolutionizing mind‒pc interfaces: overcoming biocompatibility challenges in implantable neural interfaces | Journal of Nanobiotechnology

Braveness, branding, and the digital-first telco

ABOUT US