You're in Help Room

The Coc Coc robot (bot, crawler) is a web crawling robot made by the Coc Coc search engine. The robot discovers and downloads websites/pages in order to include them in Coc Coc’s search engine index. By allowing our robot to crawl your website, you can make your site more popular by increasing the number of users who are able to find your content.

Below you can find the answers to some common questions about the Coc Coc robot and search.

1. I can't find my site/pages in the search results on coccoc.com/search.

You can check if we have added any pages from your site to our index by using the “site:” operator. Go to coccoc.com/search and enter the query "site:yourwebsite.com yourwebsite.com". For example: ​site:coccoc.com coccoc.com.

If your query retrieves no results, there are several potential reasons why your site may not be included in our index.

First, make sure that your pages can be found by our robot. The robot is a program that crawls the web by following links. It downloads and indexes pages by extracting URLs from discovered pages and downloading pages from these new URLs. This process is repeated in a loop. Thus, in order for your site to be discoverable by our robot, your site needs to be linked to by other sites. These links must not be marked with the nofollow instruction. Further, since our robot is designed for the Vietnamese part of the Internet, the links must be in that area. Our robot attempts to limit itself to content in Vietnamese. To accomplish this, it extracts links from pages written in Vietnamese and from .vn domain in general. If your site is brand new, it might also take us some time to find the pages that link to it.

If your site has such links, check that the content those URLs link to is reachable by our robot and allowed to be added to our index. Check your ​robots.txt file and make sure it doesn't disallow our robot to download your pages. Further, make sure there is no ​noindex instruction in the content of your pages. Also, your site (or your Internet service provider, ISP) can block our robot by either blocking our user agent string or by blocking the IP addresses of our robot. Contact your system administrators/ISP to check if this is the problem.

If you would like to manually add some pages to our index, submit the relevant URLs via this form:

2. I want to remove my site/pages from the search results on coccoc.com/search.

There are several ways you can instruct our robot to not add your pages to our index and/or to not download your pages.

First, you can give instructions to our robots in your ​robots.txt file. If, for example, you want to instruct our main web robot to not download any pages that have a path starting with /cgi-bin/, you can add the following instructions to your robots.txt:

User-agent: coccocbot-web
Disallow: /cgi-bin/

In most cases, URLs that are disallowed in robots.txt will not appear in our index, because they won't be downloaded by our robot. However, some pages with URLs disallowed in robots.txt might still be added to our index without the pages’ content. In these cases, snippets similar to the example below can appear in our search results:

example.com
There is no general description due to policy restricting access from the host site

To instruct our robot to completely exclude a page from our index, you can use the ​noindex instruction on the page. If you use noindex, make sure that the page's URL is not disallowed for downloading in your robots.txt. In this case, even if our robot can find links to your page, it won't add the page to the index (though it will still download the page because it needs to download it to find the noindex instruction).

You can also use the ​nofollow instruction for some links on your site. If all links to a specific URL on your site use this instruction, our robot won't request the URL. Of course, because you can’t control other sites, they could still include a link to your site that doesn’t have the nofollow instruction. If our robot finds such links, it can request your URL and still add it to the index. Therefore, the nofollow instruction can be used internally on your site to exclude some its sections or URLs from downloading and indexing by our robot. However, using nofollow like this cannot completely prevent your pages from being indexed.

It can take some time before changes in your robots.txt or other instructions update in our index. If you've made such changes to your site and want to force applying your changes to our index, you can submit your url/site via this form:

3. Your robot overloads my site/server.

If you want to lower the rate at which our robot visits your site, you can use the crawl-delay directive in your ​robots.txt. For example, to set a 5 second crawl delay for our main web robot, add the following directives to your robots.txt:

User-agent: coccocbot-web
Crawl-delay: 5

Please note: We don't support crawl delays greater than 10 seconds. Therefore, a crawl delay of 100 seconds is treated as a 10 second crawl delay.

An increased crawl delay may lead to slow updating of your site in our index. If your site is small, this might not be a problem. In this case, even with a large delay value, our robot is able to request pages from your site with enough frequency. However, if you have millions of pages, a large delay value can impact the update speed. In this case, first make sure all your indexable pages are useful to users. For example, a site might use ​Faceted search. Faceted search is a common way to find an item in a big database by using filters. Every filter is represented by a parameter in URLs. The number of combinations of different values can be huge, though only a small number actually produce items from the database. If you want to direct our robot not to download some pages from your site, please read the answer for question 2 above.

If you are an Internet service provider and you want to lower the visit rate of our robot to your servers, please contact us via the questions and complaints form below.

4. Still have some questions or complaints?

Submit them via this form: