| View previous topic :: View next topic |
| Author |
Message |
facmit
Joined: 21 Oct 2009 Posts: 87
|
Posted: Thu Nov 05, 2009 12:52 am Post subject: google! [indexes the bogleheads fast] |
|
|
I posted a question half hour ago, and then google finds it, the first one!
too good to be true?? |
|
| Back to top |
|
 |
norookie
Joined: 07 Jul 2009 Posts: 241
|
Posted: Thu Nov 05, 2009 3:28 am Post subject: |
|
|
Recoginition by Goog is sweet for bogleheads and others.....imo. just sayin.. _________________ "I hope to put my last dime when I die, in the parking meter in front of the state house, then die in my car awaiting many parking tickets" |
|
| Back to top |
|
 |
facmit
Joined: 21 Oct 2009 Posts: 87
|
Posted: Thu Nov 05, 2009 9:14 am Post subject: |
|
|
| norookie wrote: | | Recoginition by Goog is sweet for bogleheads and others.....imo. just sayin.. |
yes, but 30 mins, that is too quick, isn't that? |
|
| Back to top |
|
 |
asdfvcx
Joined: 18 Mar 2007 Posts: 7
|
Posted: Thu Nov 05, 2009 4:27 pm Post subject: |
|
|
Every time Google scans a website, it does a check to see how if there have been changes since the last time it scanned (and possibly how large a change).
If it finds there has been changes (or large enough changes, their algorithm isn't public), it decreases the time in between scans. In this way sites with frequent updates, for example news sites or very active public forums, tend to get scanned much more frequently. And Google has both the computing power and the employee expertise to get the updated scans into their index very quickly. |
|
| Back to top |
|
 |
KyleAAA
Joined: 01 Jul 2009 Posts: 346
|
Posted: Thu Nov 05, 2009 4:32 pm Post subject: |
|
|
| facmit wrote: | | norookie wrote: | | Recoginition by Goog is sweet for bogleheads and others.....imo. just sayin.. |
yes, but 30 mins, that is too quick, isn't that? |
Not at all. It might be a bit slow, actually. Google keeps track of how often sites are updated and indexes them accordingly. Obviously, large forums are updated a lot and have plenty of backlinks so google comes back a lot. |
|
| Back to top |
|
 |
Bernd
Joined: 26 Feb 2007 Posts: 217
|
Posted: Thu Nov 05, 2009 4:42 pm Post subject: |
|
|
Only 30 minutes?
Look at the Forum list of messages - the one you see first when you select this forum - the latest reply messages are 66-70 minutes ago from current time. Which forum list of messages does Google then check? Or is there another list we do not see? |
|
| Back to top |
|
 |
asdfvcx
Joined: 18 Mar 2007 Posts: 7
|
Posted: Thu Nov 05, 2009 4:48 pm Post subject: |
|
|
| Bernd wrote: | | Look at the Forum list of messages - the one you see first when you select this forum - the latest reply messages are 66-70 minutes ago from current time. |
Are you sure you have your timezone in your Profile set correctly? Due to the recent change in Daylight Savings Time, you may not have the correct time set. |
|
| Back to top |
|
 |
KyleAAA
Joined: 01 Jul 2009 Posts: 346
|
Posted: Thu Nov 05, 2009 5:17 pm Post subject: |
|
|
| Bernd wrote: | Only 30 minutes?
Look at the Forum list of messages - the one you see first when you select this forum - the latest reply messages are 66-70 minutes ago from current time. Which forum list of messages does Google then check? Or is there another list we do not see? |
It doesn't check a list (although the forum may have a sitemap), it simply crawls the web. I have a few websites that I only update once daily (sometimes less) and they are almost always indexed by google within 30 minutes. It's not that unusual. |
|
| Back to top |
|
 |
Alex Frakt Site Admin
Joined: 23 Feb 2007 Posts: 3636 Location: Chicago
|
Posted: Thu Nov 05, 2009 5:56 pm Post subject: |
|
|
| Google's spiders are constantly crawling this site. They retrieve around 3 pages per second and make up over 10% of our total traffic. All the rest of the search engines put together are less than 2%. |
|
| Back to top |
|
 |
facmit
Joined: 21 Oct 2009 Posts: 87
|
Posted: Thu Nov 05, 2009 8:03 pm Post subject: |
|
|
| KyleAAA wrote: |
It doesn't check a list (although the forum may have a sitemap), it simply crawls the web. I have a few websites that I only update once daily (sometimes less) and they are almost always indexed by google within 30 minutes. It's not that unusual. |
so basically it crawls the whole internet within 30mins? |
|
| Back to top |
|
 |
Jack
Joined: 27 Feb 2007 Posts: 1077
|
Posted: Thu Nov 05, 2009 9:34 pm Post subject: |
|
|
| Alex Frakt wrote: | | Google's spiders are constantly crawling this site. They retrieve around 3 pages per second and make up over 10% of our total traffic. All the rest of the search engines put together are less than 2%. |
10% of traffic is google? That is an amazing number. Multiply that by the many millions of web sites and it means google is very, very busy. I wonder what percentage of all web traffic or non-email traffic is just google bots.
I know that you can put instructions in the robots.txt file to keep bots from indexing certain or all pages. Is there a way to allow indexing but to limit the frequency to reduce bandwidth consumed? |
|
| Back to top |
|
 |
LadyGeek

Joined: 20 Dec 2008 Posts: 822 Location: Philly suburb
|
Posted: Thu Nov 05, 2009 9:42 pm Post subject: |
|
|
It doesn't seem to be crawling the wiki very frequently. Maybe it's not intended to do so.
The Google search engine is available for this forum, if anyone is looking. Checkout the search engine choices at the top of the wiki's main page. _________________ Some say the glass half-full. Others say the glass is half-empty. To an engineer, it’s twice as big as it needs to be. Link to Wiki |
|
| Back to top |
|
 |
asdfvcx
Joined: 18 Mar 2007 Posts: 7
|
Posted: Thu Nov 05, 2009 10:25 pm Post subject: |
|
|
| Jack wrote: | | I know that you can put instructions in the robots.txt file to keep bots from indexing certain or all pages. Is there a way to allow indexing but to limit the frequency to reduce bandwidth consumed? |
Crawl-Delay is a non-standard parameter that you can insert into a robots.txt. I believe it is respected by Yahoo and MSN, but not Google.
Google instead has a feature where you can log into their webmaster's tools site and set the crawl rate for your site. But before you can do this, you have to register the site with Google and verify that you control the site.
http://www.google.com/support/....swer=48620 |
|
| Back to top |
|
 |
mudfud

Joined: 20 Feb 2007 Posts: 932
|
Posted: Thu Nov 05, 2009 11:16 pm Post subject: |
|
|
Check out the number of "users online" on the index page of this forum.
http://www.bogleheads.org/forum/index.php
Currently there are "94 guests".
Many of these are googlebots. _________________ "Are you sure you have tested an a priori hypothesis?"
 |
|
| Back to top |
|
 |
Alex Frakt Site Admin
Joined: 23 Feb 2007 Posts: 3636 Location: Chicago
|
Posted: Thu Nov 05, 2009 11:18 pm Post subject: |
|
|
| asdfvcx wrote: | | Google instead has a feature where you can log into their webmaster's tools site and set the crawl rate for your site. But before you can do this, you have to register the site with Google and verify that you control the site. |
I've used this to get them to slow it down a couple of times when we were having temporary bandwidth issues. |
|
| Back to top |
|
 |
KyleAAA
Joined: 01 Jul 2009 Posts: 346
|
Posted: Fri Nov 06, 2009 9:21 am Post subject: |
|
|
| facmit wrote: | | KyleAAA wrote: |
It doesn't check a list (although the forum may have a sitemap), it simply crawls the web. I have a few websites that I only update once daily (sometimes less) and they are almost always indexed by google within 30 minutes. It's not that unusual. |
so basically it crawls the whole internet within 30mins? |
Of course not. The vast majority of the internet isn't updated often (if ever: some sites haven't changed in years), so there's no need to bother. The frequency that google crawls a site is directly related to how often it is updated. |
|
| Back to top |
|
 |
KyleAAA
Joined: 01 Jul 2009 Posts: 346
|
Posted: Fri Nov 06, 2009 9:22 am Post subject: |
|
|
| Jack wrote: | | Alex Frakt wrote: | | Google's spiders are constantly crawling this site. They retrieve around 3 pages per second and make up over 10% of our total traffic. All the rest of the search engines put together are less than 2%. |
10% of traffic is google? That is an amazing number. Multiply that by the many millions of web sites and it means google is very, very busy. I wonder what percentage of all web traffic or non-email traffic is just google bots.
I know that you can put instructions in the robots.txt file to keep bots from indexing certain or all pages. Is there a way to allow indexing but to limit the frequency to reduce bandwidth consumed? |
Absolutely. Google webmaster tools will let you do it. But that would generally be a mistake. |
|
| Back to top |
|
 |
|