The User-Agent is an application installed on the user’s computer that connects to a server process. Examples of user agents are web browsers, media players and email client programs such as Outlook, Thunderbird,… Today the term is used mainly in reference to clients accessing the web. In addition to browsers, Web User Agents can be Search Engine crawlers , cell phones, screen readers and Braille browsers used by blind people.
“Crawler” is a generic term for any program (such as a robot or spider) used for the automatic discovery and crawling of websites by following links from one web page to another. Google’s primary crawler is Googlebot .
When Internet users visit a website, a text string is usually sent to make the server identify the user agent ( HTTP header ). This is part of the HTTP request, prefixed with “User-agent:” or “User-Agent:” and typically includes information such as the client application name, version, operating system, and language. Bots often include the owner’s web address and email address as well, so that the site administrator can contact him.
The user-agent string is one of the criteria for which some bots can be excluded from some pages using the robots.txt file. This allows webmasters, who believe that some parts of their site (or the whole site) should not be included in the data collected by a particular bot or that that particular bot is using too much bandwidth, to block access to the pages.
Google User-Agent List
|CRAWLER||USER-AGENT||HTTP (S) REQUIRES THE USER AGENT|
|Googlebot||Mozilla / 5.0 (compatible; Googlebot / 2.1; + http: //www.google.com/bot.html)(rarely used): Googlebot / 2.1 (+ http: //www.google.com/bot.html)|
|Googlebot-Image / 1.0|
|Googlebot-Video / 1.0|
|Google Mobile||Googlebot-Mobile||SAMSUNG-SGH-E250 / 1.0 Profile / MIDP-2.0 Configuration / CLDC-1.1 UP.Browser / 126.96.36.199.c.1.101 (GUI) MMP / 2.0 (compatible; Googlebot-Mobile / 2.1; + http: // www. google.com/bot.html)DoCoMo / 2.0 N905i (c100; TB; W24H16) (compatible; Googlebot-Mobile / 2.1; + http: //www.google.com/bot.html)|
|Google Smartphone||Googlebot||Mozilla / 5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit / 600.1.4 (KHTML, like Gecko) Version / 8.0 Mobile / 12F70 Safari / 600.1.4 (compatible; Googlebot / 2.1; + http: // www .google.com / bot.html)Since April 2016 it changed to: Mozilla / 5.0 (Linux; Android 6.0.1; Nexus 5X Build / MMB29P) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 41.0.2272.96 Mobile Safari / 537.36 (compatible; Googlebot / 2.1; + http://www.google.com/bot.html)|
|Google Mobile AdSense||Mediapartners-GoogleMediapartners (Googlebot)||[various types of mobile devices] (compatible; Mediapartners-Google / 2.1; + http: //www.google.com/bot.html)|
|Google AdSense||Mediapartners-GoogleMediapartners (Googlebot)||Mediapartners-Google|
|Google AdsBot landing page quality check||AdsBot-Google||AdsBot-Google (+ http: //www.google.com/adsbot.html)|
Click the image to download itList of User-Agents used by Google spiders
By analyzing the web server log it is possible to trace which spider has visited the site and which pages it has requested. Knowing what spider a user agent is referring to helps us understand what is happening on our website.
When rules for different user agents are entered in the robots.txt file, Google follows the more specific one. If you want to allow all Google crawlers to crawl your pages, you don’t need a robots.txt file. If you want to prevent or allow all Google crawlers to access some of your content, specify the user agent Googlebot . For example, if you want all of your pages to appear in Google search results and you want AdSense ads to show on the pages, you don’t need a robots.txt file. Similarly, if you want to prevent Google from accessing certain pages, block access to the user agent Googlebot; in this way you will also prevent access to all other Google user agents.
If, however, you want to have finer control, you can. For example, you may want all of your pages to appear in Google Search, but avoid crawling images in your personal directory. In this case, use the robots.txt file to prevent the user agent Googlebot-image from crawling the files in your / personal directory (but allowing Googlebot to crawl all files), as follows:
To take another example, let’s say you want to show ads on all of your pages but prefer those pages not to appear in Google Search. In this case you should block Googlebot but allow Mediapartners-Google, as follows:
Some pages use different robots meta tags to specify instructions for different crawlers, as follows:In this case Google will use the sum of the negative instructions and Googlebot will follow both the noindex and nofollow instructions.
How to change User-Agent to Google Chrome
You can test your pages using different User-Agents directly with Google Chrome by changing the settings in More tools >> Developer tools :
How to change User-Agent with Google Chrome