Google has officially announced the Google spider to follow the index-related Robots.txt instructions on July 1, 2019. The publisher who relying on the Robots.txt NoIndex instruction must delete it before September 1, 2019 and start using an alternative method.
Please abandon the use of Robots.txt NoIndex instructions
Although Google has never formally explained this before, add NoIndex instructions in the robots.txt file has become supported features, combined with NOINDEX and Disallow in Robots.txt can help optimize the efficiency of capture Where the NoIndex command prevents the page from displaying in the search results and stops the capture page:
Disallow: / example-page-1 / dispialow: / example-page-2 / noIndex: / example-page-1 / noIndex: / example-page-2 /
And many SEO masters pointed out that Most of Google complies with the Robots.txt NoIndex instruction. The conclusion they attend is:
“Ultimately, the NOINDEX instruction in Robots.txt is very effective. In the 12 cases we test, there are 11 roles. It can apply to your website, and because it is implemented, it provides you with preventing crawling And the path to delete it from the index.
This concept is very useful. However, our test did not show 100% success, so it is not always valid. “
Why does Google announced that it will no longer follow the Robots.txt NoIndex instructions?
The reason for the NoIndex Robots.txt instruction is not supported because it is not an official directive. Just as Google said:
In order to maintain a healthy ecosystem, we will prepare for future possible open source versions, we will deactivate all handles unsupported and unpublished rules (such as noIndex) on September 1, 2019.
What does this mean for using the Robots.txt NoIndex site?
If you have used NoIndex in the Robots.txt file, Google will no longer support it.
If you continue using NOINDEX in the Robots.txt file, you will see a notification in the Google Search Console.
If you abandon the use of Robots.txt NoIndex instructions, do you have other alternative solutions?
1. Use the “NoIndex” meta-tag to block the search engine index
To prevent the search engine crawler index page, you can use the “NOINDEX” metallic tag and add it to the page head tag section.
Alternatively, you can use the HTTP response head and a X-Robots-TAG indicator reptile program not index page
HTTP / 1.1 200 ok (…) X-Robots-Tag: NoIndex
2. Use 404 and 410 HTTP status code
The 404 status code indicates that the requested page does not exist or has been deleted!
The 404 status code indicates that the requested resource is no longer available on the server.
410 is the status code returned when the target resource is no longer available on the source server.
Both state code indicate that the page does not exist. Once these URLs are captured and processed, these URLs are removed from the Google index.
3. Use password protection
You can hide the page after logging in because Google does not index hidden in paid content or after logging in.
4. Robots.txt settings for prohibiting Google Bot
You can use the Disallow instruction in the robots.txt file to indicate that the search engine does not allow indexing of the selected page, which only means that the search engine does not capture a specific page.
5. Delete URL Tools with Google Trolls Google Search Console
You can use Google Stationmaster Tool Google Search Console to delete the URL tool temporarily delete the URL from the search results, will last for 90 days. If you want to be permanently deleted, you can use any of the four methods suggested.
If you have any use or intend to continue using Robots.txt NoIndex, it is recommended that you give up your use as soon as possible! By borrowing this method to ban Google spider crawled has completely expired.