Bogleheads Home Bogleheads
Investing Advice Inspired by Jack Bogle
 
  WikiWiki    FAQFAQ    SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

google! [indexes the bogleheads fast]

 
Post new topic   Reply to topic    Bogleheads Forum Index -> Forum Issues and Administration
View previous topic :: View next topic  
Author Message
facmit



Joined: 21 Oct 2009
Posts: 87

PostPosted: Thu Nov 05, 2009 12:52 am    Post subject: google! [indexes the bogleheads fast] Reply with quote

I posted a question half hour ago, and then google finds it, the first one!

too good to be true??
Back to top
View user's profile Send private message
norookie



Joined: 07 Jul 2009
Posts: 241

PostPosted: Thu Nov 05, 2009 3:28 am    Post subject: Reply with quote

Recoginition by Goog is sweet for bogleheads and others.....imo. just sayin..
_________________
"I hope to put my last dime when I die, in the parking meter in front of the state house, then die in my car awaiting many parking tickets"
Back to top
View user's profile Send private message
facmit



Joined: 21 Oct 2009
Posts: 87

PostPosted: Thu Nov 05, 2009 9:14 am    Post subject: Reply with quote

norookie wrote:
Recoginition by Goog is sweet for bogleheads and others.....imo. just sayin..


yes, but 30 mins, that is too quick, isn't that?
Back to top
View user's profile Send private message
asdfvcx



Joined: 18 Mar 2007
Posts: 7

PostPosted: Thu Nov 05, 2009 4:27 pm    Post subject: Reply with quote

Every time Google scans a website, it does a check to see how if there have been changes since the last time it scanned (and possibly how large a change).

If it finds there has been changes (or large enough changes, their algorithm isn't public), it decreases the time in between scans. In this way sites with frequent updates, for example news sites or very active public forums, tend to get scanned much more frequently. And Google has both the computing power and the employee expertise to get the updated scans into their index very quickly.
Back to top
View user's profile Send private message
KyleAAA



Joined: 01 Jul 2009
Posts: 346

PostPosted: Thu Nov 05, 2009 4:32 pm    Post subject: Reply with quote

facmit wrote:
norookie wrote:
Recoginition by Goog is sweet for bogleheads and others.....imo. just sayin..


yes, but 30 mins, that is too quick, isn't that?


Not at all. It might be a bit slow, actually. Google keeps track of how often sites are updated and indexes them accordingly. Obviously, large forums are updated a lot and have plenty of backlinks so google comes back a lot.
Back to top
View user's profile Send private message Visit poster's website
Bernd



Joined: 26 Feb 2007
Posts: 217

PostPosted: Thu Nov 05, 2009 4:42 pm    Post subject: Reply with quote

Only 30 minutes?
Look at the Forum list of messages - the one you see first when you select this forum - the latest reply messages are 66-70 minutes ago from current time. Which forum list of messages does Google then check? Or is there another list we do not see?
Back to top
View user's profile Send private message
asdfvcx



Joined: 18 Mar 2007
Posts: 7

PostPosted: Thu Nov 05, 2009 4:48 pm    Post subject: Reply with quote

Bernd wrote:
Look at the Forum list of messages - the one you see first when you select this forum - the latest reply messages are 66-70 minutes ago from current time.

Are you sure you have your timezone in your Profile set correctly? Due to the recent change in Daylight Savings Time, you may not have the correct time set.
Back to top
View user's profile Send private message
KyleAAA



Joined: 01 Jul 2009
Posts: 346

PostPosted: Thu Nov 05, 2009 5:17 pm    Post subject: Reply with quote

Bernd wrote:
Only 30 minutes?
Look at the Forum list of messages - the one you see first when you select this forum - the latest reply messages are 66-70 minutes ago from current time. Which forum list of messages does Google then check? Or is there another list we do not see?


It doesn't check a list (although the forum may have a sitemap), it simply crawls the web. I have a few websites that I only update once daily (sometimes less) and they are almost always indexed by google within 30 minutes. It's not that unusual.
Back to top
View user's profile Send private message Visit poster's website
Alex Frakt
Site Admin


Joined: 23 Feb 2007
Posts: 3636
Location: Chicago

PostPosted: Thu Nov 05, 2009 5:56 pm    Post subject: Reply with quote

Google's spiders are constantly crawling this site. They retrieve around 3 pages per second and make up over 10% of our total traffic. All the rest of the search engines put together are less than 2%.
Back to top
View user's profile Send private message Visit poster's website
facmit



Joined: 21 Oct 2009
Posts: 87

PostPosted: Thu Nov 05, 2009 8:03 pm    Post subject: Reply with quote

KyleAAA wrote:

It doesn't check a list (although the forum may have a sitemap), it simply crawls the web. I have a few websites that I only update once daily (sometimes less) and they are almost always indexed by google within 30 minutes. It's not that unusual.


so basically it crawls the whole internet within 30mins?
Back to top
View user's profile Send private message
Jack



Joined: 27 Feb 2007
Posts: 1077

PostPosted: Thu Nov 05, 2009 9:34 pm    Post subject: Reply with quote

Alex Frakt wrote:
Google's spiders are constantly crawling this site. They retrieve around 3 pages per second and make up over 10% of our total traffic. All the rest of the search engines put together are less than 2%.

10% of traffic is google? That is an amazing number. Multiply that by the many millions of web sites and it means google is very, very busy. I wonder what percentage of all web traffic or non-email traffic is just google bots.

I know that you can put instructions in the robots.txt file to keep bots from indexing certain or all pages. Is there a way to allow indexing but to limit the frequency to reduce bandwidth consumed?
Back to top
View user's profile Send private message
LadyGeek



Joined: 20 Dec 2008
Posts: 822
Location: Philly suburb

PostPosted: Thu Nov 05, 2009 9:42 pm    Post subject: Reply with quote

It doesn't seem to be crawling the wiki very frequently. Maybe it's not intended to do so.

The Google search engine is available for this forum, if anyone is looking. Checkout the search engine choices at the top of the wiki's main page.
_________________
Some say the glass half-full. Others say the glass is half-empty. To an engineer, it’s twice as big as it needs to be. Link to Wiki
Back to top
View user's profile Send private message
asdfvcx



Joined: 18 Mar 2007
Posts: 7

PostPosted: Thu Nov 05, 2009 10:25 pm    Post subject: Reply with quote

Jack wrote:
I know that you can put instructions in the robots.txt file to keep bots from indexing certain or all pages. Is there a way to allow indexing but to limit the frequency to reduce bandwidth consumed?

Crawl-Delay is a non-standard parameter that you can insert into a robots.txt. I believe it is respected by Yahoo and MSN, but not Google.



Google instead has a feature where you can log into their webmaster's tools site and set the crawl rate for your site. But before you can do this, you have to register the site with Google and verify that you control the site.

http://www.google.com/support/....swer=48620
Back to top
View user's profile Send private message
mudfud



Joined: 20 Feb 2007
Posts: 932

PostPosted: Thu Nov 05, 2009 11:16 pm    Post subject: Reply with quote

Check out the number of "users online" on the index page of this forum.

http://www.bogleheads.org/forum/index.php

Currently there are "94 guests".

Many of these are googlebots.
_________________
"Are you sure you have tested an a priori hypothesis?"

Back to top
View user's profile Send private message
Alex Frakt
Site Admin


Joined: 23 Feb 2007
Posts: 3636
Location: Chicago

PostPosted: Thu Nov 05, 2009 11:18 pm    Post subject: Reply with quote

asdfvcx wrote:
Google instead has a feature where you can log into their webmaster's tools site and set the crawl rate for your site. But before you can do this, you have to register the site with Google and verify that you control the site.

I've used this to get them to slow it down a couple of times when we were having temporary bandwidth issues.
Back to top
View user's profile Send private message Visit poster's website
KyleAAA



Joined: 01 Jul 2009
Posts: 346

PostPosted: Fri Nov 06, 2009 9:21 am    Post subject: Reply with quote

facmit wrote:
KyleAAA wrote:

It doesn't check a list (although the forum may have a sitemap), it simply crawls the web. I have a few websites that I only update once daily (sometimes less) and they are almost always indexed by google within 30 minutes. It's not that unusual.


so basically it crawls the whole internet within 30mins?


Of course not. The vast majority of the internet isn't updated often (if ever: some sites haven't changed in years), so there's no need to bother. The frequency that google crawls a site is directly related to how often it is updated.
Back to top
View user's profile Send private message Visit poster's website
KyleAAA



Joined: 01 Jul 2009
Posts: 346

PostPosted: Fri Nov 06, 2009 9:22 am    Post subject: Reply with quote

Jack wrote:
Alex Frakt wrote:
Google's spiders are constantly crawling this site. They retrieve around 3 pages per second and make up over 10% of our total traffic. All the rest of the search engines put together are less than 2%.

10% of traffic is google? That is an amazing number. Multiply that by the many millions of web sites and it means google is very, very busy. I wonder what percentage of all web traffic or non-email traffic is just google bots.

I know that you can put instructions in the robots.txt file to keep bots from indexing certain or all pages. Is there a way to allow indexing but to limit the frequency to reduce bandwidth consumed?


Absolutely. Google webmaster tools will let you do it. But that would generally be a mistake.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Bogleheads Forum Index -> Forum Issues and Administration All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group