The first extended image text prompt dataset is called DiffusionDB. It contains 2 million photos generated by Stable Diffusion which were produced using prompts and hyperparameters provided by real users.
Users can now create high quality photos by writing natural language text prompts. Still, producing photographs with the proper detail requires the right stimuli, but sometimes you need to clarify how a model will respond to various prompts or what the ideal prompts are. Researchers present DIFFUSIONDB, the first text-to-image prompt dataset, to help researchers address these important issues. 2 million photos produced by Stable Diffusion using prompts and hyperparameters given by real users are contained in DIFFUSIONDB. They review the dataset prompts and discuss their key features. The exceptional size and diversity of this human-powered dataset offers fascinating research avenues for understanding how generative models and prompts interact, spotting deep counterfeits, and developing human-powered systems.
Assistance with tasks and classifications
The unparalleled size and diversity of this human-powered dataset offers fascinating research opportunities to understand the interplay between prompts and generative patterns, detect deepfakes, and develop human-AI interaction tools to facilitate adoption of these models by users.
Adding subsets of datasets
DiffusionDB is 1.6TB. However, thanks to our modular file structure, you can quickly import a desired amount of photos with their prompts and hyperparameters. They show three ways to load part of DiffusionDB.
Using the HUGGING FACE DATASETS LOADER is method 1
The Hugging Face Datasets library makes it easy to load questions and images from DiffusionDB. They predefined 16 DiffusionDB subsets using the number of instances.
METHOD #2: Download the Poloclub app
You can download and load DiffusionDB using the Python download.py downloader included in this repository. From the command line you can use it.
get a single folder
On HuggingFace, the number at the end of the file indicates which specific file to download. The program will automatically inflate the number and produce the URL.
get various files
The -i and -r parameters define the upper and lower limits of the list of files to download.
Use metadata.parquet in method 3 (TEXT ONLY)
You can easily access the 2 million prompts and hyperparameters in the metadata.parquet table if your task does not require images.
DiffusionDB is the result of scraping images generated by users on the official Stable Diffusion Discord server. The server has strict rules against generating and sharing illegal, hateful, or NSFW (not suitable for work) images. It also prohibits users from writing or sharing prompts with personal information.
Creating a dataset
Rationale for retention
Recent broadcast models have become very popular as they allow to generate high quality controlled images from text clues using natural language. Since the release of these models, people in other fields have quickly used them to produce hyperrealistic films, synthetic x-ray scans, and even award-winning works of art.
However, creating images with the correct information is time consuming because users must properly phrase prompts that explain the precise results they are looking for. Such impulses must be developed by trial and error, often seeming random and unprincipled. Researcher Compares Writing Prompts To Wizards Acquiring ‘Magical Spells’; users may not understand why certain prompts are effective, but they will add them to their “spellbook” nonetheless. For example, it has become customary to include unique phrases such as “trending on artstation” and “unreal engine” in the prompt to produce extremely detailed photos.
As part of text-to-text generation, prompt engineering has become a subject of study where researchers systematically analyze how to create prompts to carry out various downstream activities. Large text-to-image models are still in their infancy. It is therefore essential to understand how they respond to prompts, how to create compelling prompts, and how to create tools that help users create images. They are developing DiffusionDB, the first large-scale prompt dataset with 2 million true prompt image pairs, to help academics solve these important problems.
Social impact of the dataset: data use considerations
This dataset aims to support the development of massively scalable generative text-image models. The unparalleled richness and diversity of this human-powered dataset offers exciting research opportunities to understand the relationship between prompts and generative patterns, detect deepfakes, and develop human-AI interaction tools to facilitate the adoption of these models by users.
Importantly, they pull prompts and visuals from the Stable Diffusion Discord server. Users are prohibited from creating or sharing harmful or NSFW photos (not acceptable for work, such as sexual and violent content) on the Discord server. The server’s stable broadcast model additionally features an NSFW filter that distorts the generated graphics when it encounters NSFW material. It is always possible that some users have created damaging photos that the NSFW filter did not detect or that the moderators of the server did not delete. Therefore, DiffusionDB could have these images. They offer a Google form on the DiffusionDB website where users can report offensive or inappropriate images and prompts to reduce any potential harm. This form will be actively monitored and any reported photos or prompts will be removed from DiffusionDB.
DiffusionDB prompts may not accurately represent photos uploaded by beta testers, as they were taken in channels where a bot could test Stable Diffusion in advance. These users are likely to be familiar with alternative text-to-image generative models because they started using Diffusion before the model was made available to the general public.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'DIFFUSIONDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, dataset and project.
Please Don't Forget To Join Our ML Subreddit
Ashish Kumar is an intern consultant at MarktechPost. He is currently pursuing his Btech from Indian Institute of Technology (IIT), Kanpur. He is passionate about exploring new technological advances and applying them to real life.
#Check #artificial #intelligence #image #dataset #called #DIFFUSIONDB #consists #million #stable #broadcast #images #text #hyperparameter #prompts