While the Robots Exclusion Protocol has been around for a quarter of a century, it was only an unofficial standard — and that has created problems with teams interpreting the format differently. One might handle an edge case differently than another. Google’s initiative, which includes submitting its approach to the Internet Engineering Task Force, would “better define” how crawlers are supposed to handle robots.txt and create fewer rude surprises.
The draft isn’t fully available, but it would work with more than just websites, include a minimum file size, set a max one-day cache time and give sites a break if there are server problems.
There’s no guarantee this will become a standard, at least as-is. If it does, though, it could help web visitors as much as it does creators. You might see more consistent web search results that respect sites’ wishes. If nothing else, this shows that Google isn’t completely averse to opening important assets if it thinks they’ll advance both its technology and the industry at large.