Jump to content

how can i make sure my whmcs is 100% not searched by google ?


rob2

Recommended Posts

Hello,

 

i hope i can use whmcs's announcements and knowledgebase feature to show info,

 

and hope search engine(google,bing,yahoo,...etc) to search my articles under announcements and knowledgebase to improve the seo,

 

but i worry if the search engine will search out the data include my client area,

 

it may include the client's billing or product or admin area data,

 

how can i prevent it ?

 

 

thank you

Link to comment
Share on other sites

A bot would need to have the login information in order to view the member area. Additionally, any bot would have to find a link in one of your pages it has access to in order to find another page.

 

So, if you created a directory, say XYZ and put some files in it but never put the link in any of your pages or any site map, then no bot will be able to find it unless they they guess the directory name and guessed the names of the files in it.

 

A bot is no different from a visitor to your site. It has no special access to your site. It can't view/indiex your member area without a valid username/password nor find any pages that aren't linked in pages it has access to.

Link to comment
Share on other sites

So, if you created a directory, say XYZ and put some files in it but never put the link in any of your pages or any site map, then no bot will be able to find it unless they they guess the directory name and guessed the names of the files in it.

I suggested you try this, as the bot will process them as separate websites

If any website page is live it can be found, unless login or robots.txt protected

Link to comment
Share on other sites

This may give you a bit of a start... it goes into your htaccess file

# deny bots access

SetEnvIfNoCase user-agent .*Indy Library.* stayout=1

SetEnvIfNoCase user-agent .*libwww-perl.* stayout=1

SetEnvIfNoCase user-agent .*Offline Navigator.* stayout=1

SetEnvIfNoCase user-agent .*Xaldon WebSpider.* stayout=1

SetEnvIfNoCase user-agent .*Wget.* stayout=1

SetEnvIfNoCase user-agent .*WebImages.* stayout=1

SetEnvIfNoCase user-agent .*WebCapture.* stayout=1

SetEnvIfNoCase user-agent .*LiteFinder.* stayout=1

SetEnvIfNoCase user-agent .*disco.* stayout=1

SetEnvIfNoCase user-agent .*SBIder.* stayout=1

SetEnvIfNoCase user-agent .*MJ12bot.* stayout=1

SetEnvIfNoCase user-agent .*Baiduspider.* stayout=1

 

Order Allow,deny

Allow from all

deny from env=stayout

Link to comment
Share on other sites

I suggested you try this, as the bot will process them as separate websites

If any website page is live it can be found, unless login or robots.txt protected

 

Any live page can be found by a bot just like it can be found by a person, but, a bot has no special access to your site to find pages, directories, or files.

 

If you put up a page xxyy3kdls.html and not tell any one about it nor reference it in any of your pages, it will be live but a bot isn't going to be able to find it except by guessing or brute force, so it's highly unlikely to be found by bot or person. But, if you have a link to it in another page on your site or reference it, then it can be found by both bot and a live person.

 

The point I'm trying to make is a bot doesn't have any more access to your site than a live person does. If there is a member area that requires a login, then neither a bot nor person can access it unless they have a login. If you have a page on your site that isn't referenced anywhere in your other pages, neither bot nor live person will be able to find it except by guessing.

Link to comment
Share on other sites

I suggested you try this, as the bot will process them as separate websites

If any website page is live it can be found, unless login or robots.txt protected

 

The robots.txt file is just a text file. If you protect it then nothing will be able to access it, you are better off just not having one.

 

Also, the robots.txt file is voluntary. Bad bots will often use the robots.txt file to see what is on your site. A person can display the robots.txt file by just pointing their browser to it. If you have a directory there that you aren't referencing, like in my example above, well, you just told the world where it is at, so it's best to leave it out. If you have a page that you don't want indexed by a spider, but, you need to have it referenced for so people can find it on your website, then you would want to tell the bot to exclude it from being indexed.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use & Guidelines and understand your posts will initially be pre-moderated