Google

Google’s New Tool Allows Websites to Opt Out of AI Data Use

BY Andrew Rossow •

September 29, 2023

Website administrators will now have the ability to use their sites to help improve Google’s Bard and Vertex AI generative APIs.

Google-Extended, according to a Sept. 28 blog post, not only allows admins to control the access to content on the site, but whether they want these AI models to crawl their website and access their data.

Web “crawling” grabs other web pages to create indices or collections of data, while web “scraping” downloads webpages to extract a specific set of data for analysis – e.g. product details, pricing information, SEO data, etc.

Google’s VP of Trust, Danielle Romain, shared in an interview that web publishers are wanting more control over how their content is used, especially for “emerging generative AI use cases.”

“By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time,” she added.

For website admins, the tool can be enabled through the robots.txt file, which governs web crawler accessibility.

This development by Google underscores the broader industry trend of offering enhanced transparency and control to web content creators.

X’s New Terms of Service Now In Effect

Earlier this month, X (formerly Twitter) revealed its updated terms of service to explicitly forbid data scraping and crawling of its platform without prior written consent.

The updated TOS which went into effect today, Sept. 29, introduces more rigid controls on unauthorized data collection methods. The previous version of the TOS allowed crawling so long as it complies to the guidelines outlined in the robots.txt file.

In August, The New York Times made its position clear after updating its terms of service to also restrict its content from being used for the purposes of training any machine learning system or AI model, such as ChatGPT.

OpenAI, the parent company of ChatGPT, however, launched its own web crawler, GPTBot, that same month, which would scrape through the internet to train and enhance its AI capabilities.

Right now, OpenAI is no stranger to copyright infringement claims, with privacy enthusiasts understandably concerned.

Editor’s note: This article was written by an nft now staff member in collaboration with OpenAI’s GPT-4.