text-to-image Archives - AI News

Meta unveils five AI models for multi-modal processing, music generation, and more

Ryan Daws — Wed, 19 Jun 2024 15:40:48 +0000

Meta has unveiled five major new AI models and research, including multi-modal systems that can process both text and images, next-gen language models, music generation, AI speech detection, and efforts to improve diversity in AI systems.

The releases come from Meta’s Fundamental AI Research (FAIR) team which has focused on advancing AI through open research and collaboration for over a decade. As AI rapidly innovates, Meta believes working with the global community is crucial.

“By publicly sharing this research, we hope to inspire iterations and ultimately help advance AI in a responsible way,” said Meta.

Chameleon: Multi-modal text and image processing

Among the releases are key components of Meta’s ‘Chameleon’ models under a research license. Chameleon is a family of multi-modal models that can understand and generate both text and images simultaneously—unlike most large language models which are typically unimodal.

“Just as humans can process the words and images simultaneously, Chameleon can process and deliver both image and text at the same time,” explained Meta. “Chameleon can take any combination of text and images as input and also output any combination of text and images.”

Potential use cases are virtually limitless from generating creative captions to prompting new scenes with text and images.

Multi-token prediction for faster language model training

Meta has also released pretrained models for code completion that use ‘multi-token prediction’ under a non-commercial research license. Traditional language model training is inefficient by predicting just the next word. Multi-token models can predict multiple future words simultaneously to train faster.

“While [the one-word] approach is simple and scalable, it’s also inefficient. It requires several orders of magnitude more text than what children need to learn the same degree of language fluency,” said Meta.

JASCO: Enhanced text-to-music model

On the creative side, Meta’s JASCO allows generating music clips from text while affording more control by accepting inputs like chords and beats.

“While existing text-to-music models like MusicGen rely mainly on text inputs for music generation, our new model, JASCO, is capable of accepting various inputs, such as chords or beat, to improve control over generated music outputs,” explained Meta.

AudioSeal: Detecting AI-generated speech

Meta claims AudioSeal is the first audio watermarking system designed to detect AI-generated speech. It can pinpoint the specific segments generated by AI within larger audio clips up to 485x faster than previous methods.

“AudioSeal is being released under a commercial license. It’s just one of several lines of responsible research we have shared to help prevent the misuse of generative AI tools,” said Meta.

Improving text-to-image diversity

Another important release aims to improve the diversity of text-to-image models which can often exhibit geographical and cultural biases.

Meta developed automatic indicators to evaluate potential geographical disparities and conducted a large 65,000+ annotation study to understand how people globally perceive geographic representation.

“This enables more diversity and better representation in AI-generated images,” said Meta. The relevant code and annotations have been released to help improve diversity across generative models.

By publicly sharing these groundbreaking models, Meta says it hopes to foster collaboration and drive innovation within the AI community.

(Photo by Dima Solomin)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Meta unveils five AI models for multi-modal processing, music generation, and more appeared first on AI News.

Stability AI previews Stable Diffusion 3 text-to-image model

Ryan Daws — Fri, 23 Feb 2024 16:49:01 +0000

London-based AI lab Stability AI has announced an early preview of its new text-to-image model, Stable Diffusion 3. The advanced generative AI model aims to create high-quality images from text prompts with improved performance across several key areas.

The announcement comes just days after Stability AI’s largest rival, OpenAI, unveiled Sora—a brand new AI model capable of generating nearly-realistic, high-definition videos from simple text prompts.

Sora, which isn’t available to the general public yet either, sparked concerns about its potential to create realistic-looking fake footage. OpenAI said it’s working with experts in misinformation and hateful content to test the tool before making it widely available.

According to Stability AI, Stable Diffusion 3 has significantly better abilities for handling multi-subject image generation compared to previous versions. This allows users to include more detailed prompts with multiple elements and achieve better results.

In addition to improvements with complex prompts, the new model boasts upgraded overall image quality and spelling accuracy. Stability AI claims these upgrades solve some consistency and coherence issues that have impacted past text-to-image models.

While not yet publicly available, Stability AI has opened a waitlist for people interested in early access to Stable Diffusion 3. The preview phase will allow Stability AI to gather feedback and continue refining the model before a full release planned later this year.

Stability AI said it is also working with experts to test Stable Diffusion 3 and ensure it mitigates potential harms, similar to OpenAI’s approach with Sora.

“We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment,” said Stability AI.

“In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.”

Stable Diffusion 3 is being offered in a range of model sizes from 800 million parameters on the low-end to 8 billion on the high-end. Stability AI said this spectrum of options aims to balance creative performance and accessibility to users with varying computational resources.

“Our commitment to ensuring generative AI is open, safe, and universally accessible remains steadfast,” explained Stability AI.

“With Stable Diffusion 3, we strive to offer adaptable solutions that enable individuals, developers, and enterprises to unleash their creativity, aligning with our mission to activate humanity’s potential.”

(Image Credit: Stability AI)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Stability AI previews Stable Diffusion 3 text-to-image model appeared first on AI News.

Google Cloud announces Imagen 2 text-to-image generator

Ryan Daws — Thu, 14 Dec 2023 16:08:57 +0000

Google Cloud has introduced Imagen 2, the latest upgrade to its text-to-image capabilities.

Available for Vertex AI customers on the allowlist, Imagen 2 enables users to craft and deploy photorealistic images using intuitive tooling and fully-managed infrastructure.

Developed with Google DeepMind technology, Imagen 2 offers improved image quality and a range of functionalities tailored for specific use cases.

Key features of Imagen 2 include:

Diverse image generation: Imagen 2 excels in creating high-resolution images from natural language prompts that cater to various user requirements.

Text rendering in multiple languages: Overcoming common challenges, Imagen 2 supports accurate text rendering in multiple languages.

Logo generation: Businesses can leverage Imagen 2 to create a variety of creative and realistic logos—with the option to overlay them on products, clothing, business cards, and more.

Captions and question-answering: Imagen 2’s advanced image understanding capabilities facilitate the creation of descriptive captions and provide detailed answers to questions about image elements.

Multi-language support: Imagen 2 introduces support for six additional languages in preview, with plans for more in early 2024. This includes the ability to translate between prompt and output.

Safety measures: Imagen 2 incorporates built-in safety precautions, aligning with Google’s Responsible AI principles. It features safety filters and integrates with a digital watermarking service to ensure responsible use.

Enterprise-ready capabilities

Imagen 2 on Vertex AI is designed to meet enterprise standards, offering reliability and governance akin to its predecessor. With new features such as high-quality image rendering, improved text rendering, logo generation, and safety measures, Imagen 2 aims to provide organisations with a comprehensive tool for creative image generation.

Leading companies like Snap, Shutterstock, and Canva have already embraced Imagen for creative purposes.

Chris Loy, Director of AI Services at Shutterstock, commented: “We exist to empower the world to tell their stories by bridging the gap between idea and execution.

“Variety is critical for the creative process, which is why we continue to integrate the latest and greatest technology into our image generator and editing features—as long as it is built on responsibly sourced data,”

Danny Wu, Head of AI at Canva, added: “We’re continuing to use generative AI to innovate the design process and augment imagination.

“With Imagen, our 170M+ monthly users can benefit from the image quality improvements to uplevel their content creation at scale.”

As Imagen 2 makes waves in the creative industry, organisations are encouraged to explore its potential. Google Cloud anticipates users will harness the new features to elevate their creative endeavours and build on the success achieved with Imagen.

(Photo by G on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google Cloud announces Imagen 2 text-to-image generator appeared first on AI News.

Shutterstock launches AI image generator with ethical focus

Ryan Daws — Wed, 25 Jan 2023 12:58:02 +0000

Stock image platform Shutterstock has launched an AI image generator with a focus on ethical practices.

Many text-to-image generators have serious allegations over their practices. Earlier this month, AI News reported that Getty Images has filed a lawsuit against Stable Diffusion creator Stability AI over alleged copyright infringement.

In an independent analysis of 12 million of the 2.3 billion images used to train Stable Diffusion, it was found to use a large number of images from stock image websites and platforms with high amounts of user-generated content like WordPress, DeviantArt, and Tumblr.

Many human artists have expressed concern about text-to-image generators harming their livelihoods. Understandably, they view it as an even bigger blow when their work is used – without compensation or credit – to train the generators.

Shutterstock claims its AI image generator is trained using assets that represent the diversity of the world we live in. The company says that it’s recognising the contributions of human artists by paying them royalties.

Paul Hennessy, CEO at Shutterstock, commented:

“Shutterstock has developed strategic partnerships over the past two years with key industry players like OpenAI, Meta, and LG AI Research to fuel their generative AI research efforts, and we are now able to uniquely bring responsibly-produced generative AI capabilities to our own customers.

Our easy-to-use generative platform will transform the way people tell their stories — you no longer have to be a design expert or have access to a creative team to create exceptional work.

Our tools are built on an ethical approach and on a library of assets that represents the diverse world we live in, and we ensure that the artists whose works contributed to the development of these models are recognised and rewarded.”

Shutterstock’s generator aims to be a “one-stop-shop” for creating images. Over 20 languages are supported and images can be created from just a single word and customised using a style picker.

You can get started with Shutterstock’s image generation platform here.

(Image Credit: Shutterstock)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Shutterstock launches AI image generator with ethical focus appeared first on AI News.

Getty is suing Stable Diffusion’s creator for copyright infringement

Ryan Daws — Wed, 18 Jan 2023 09:05:33 +0000

Stock image service Getty Images is suing Stable Diffusion creator Stability AI over alleged copyright infringement.

Stable Diffusion is one of the most popular text-to-image tools. Unlike many of its rivals, the generative AI model can run on a local computer.

Apple is a supporter of the Stable Diffusion project and recently optimised its performance on M-powered Macs. Last month, AI News reported that M2 Macs can now generate images using Stable Diffusion in under 18 seconds.

Text-to-image generators like Stable Diffusion have come under the spotlight for potential copyright infringement. Human artists have complained their creations have been used to train the models without permission or compensation.

Getty Images has now accused Stability AI of using its content and has commenced legal proceedings.

In a statement, Getty Images wrote:

“This week Getty Images commenced legal proceedings in the High Court of Justice in London against Stability AI claiming Stability AI infringed intellectual property rights including copyright in content owned or represented by Getty Images. It is Getty Images’ position that Stability AI unlawfully copied and processed millions of images protected by copyright and the associated metadata owned or represented by Getty Images absent a license to benefit Stability AI’s commercial interests and to the detriment of the content creators.

Getty Images believes artificial intelligence has the potential to stimulate creative endeavors. Accordingly, Getty Images provided licenses to leading technology innovators for purposes related to training artificial intelligence systems in a manner that respects personal and intellectual property rights. Stability AI did not seek any such license from Getty Images and instead, we believe, chose to ignore viable licensing options and long-standing legal protections in pursuit of their stand-alone commercial interests.”

While the images used for training alternatives like DALL-E 2 haven’t been disclosed, Stability AI has been transparent about how their model is trained. However, that may now have put the biz in hot water.

In an independent analysis of 12 million of the 2.3 billion images used to train Stable Diffusion, conducted by Andy Baio and Simon Willison, they found it was trained using images from nonprofit Common Crawl which scrapes billions of webpages monthly.

“Unsurprisingly, a large number came from stock image sites. 123RF was the biggest with 497k, 171k images came from Adobe Stock’s CDN at ftcdn.net, 117k from PhotoShelter, 35k images from Dreamstime, 23k from iStockPhoto, 22k from Depositphotos, 22k from Unsplash, 15k from Getty Images, 10k from VectorStock, and 10k from Shutterstock, among many others,” wrote the researchers.

Platforms with high amounts of user-generated content such as Pinterest, WordPress, Blogspot, Flickr, DeviantArt, and Tumblr were also found to be large sources of images that were scraped for training purposes.

The concerns around the use of copyrighted content for training AI models appear to be warranted. It’s likely we’ll see a growing number of related lawsuits over the coming months and years unless a balance is found between enabling AI training and respecting the work of human creators.

In October, Shutterstock announced that it was expanding its partnership with DALL-E creator OpenAI. As part of the expanded partnership, Shutterstock will offer DALL-E images to customers.

The partnership between Shutterstock and OpenAI will see the former create frameworks that will compensate artists when their intellectual property is used and when their works have contributed to the development of AI models.

(Photo by Tingey Injury Law Firm on Unsplash)

Relevant: Adobe to begin selling AI-generated stock images

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Getty is suing Stable Diffusion’s creator for copyright infringement appeared first on AI News.

Stable Diffusion text-to-image generator is now publicly available

Ryan Daws — Wed, 24 Aug 2022 10:54:05 +0000

Text-to-image generator Stable Diffusion is now available for anyone to put to the test.

Stable Diffusion is developed by Stability AI and was initially released for researchers earlier this month. The image generator claims to deliver a breakthrough in speed and quality that can run on consumer GPUs.

The model is based on the latent diffused model created by CompVis and Runway but enhanced with insights from conditional diffusion models by Stable Diffusion’s lead generative AI developer Katherine Crowson, Open AI, Google Brain, and others.

“This model builds on the work of many excellent researchers and we look forward to the positive effect of this and similar models on society and science in the coming years as they are used by billions worldwide,” said Emad Mostaque, CEO of Stability AI.

The core dataset was trained on LAION-Aesthetics, a dataset that filters the 5.85 billion images in the LAION-5B dataset based on how “beautiful” an image was, building on ratings from the alpha testers of Stable Diffusion.

Stable Diffusion runs on computers with under 10GB of VRAM and generates 512×512 pixel resolution images in just a few seconds.

“We’re excited that state-of-the-art text-to-image models are being built openly and we are happy to collaborate with CompVis and Stability.ai towards safely and ethically releasing the models to the public and help democratise ML capabilities with the whole community,” commented Apolinário, ML Art Engineer at AI community Hugging Face.

Stable Diffusion goes head-to-head against other text-to-image models including Midjourney, DALL-E 2, and Imagen.

DALL-E 2 vs Midjourney vs StableDiffusion mega thread: photography, illustration, painters, abstract

these image synths are like instruments – it's amazing we'll get so many of them, each with a unique "sound" 🤯

rules: same prompt, 1:1 aspect ratio, no living artists pic.twitter.com/47syy7uPJJ
— fabians.eth (@fabianstelzer) August 20, 2022

An interactive space to test Stable Diffusion has been created here.

(Image Credit: Fabian Stelzer)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Stable Diffusion text-to-image generator is now publicly available appeared first on AI News.