Site not being indexed by google?: Solution

Author Message

Bruce Morrison

Thursday 27 November 2003 1:59:43 pm

Hi all

I have worked on a number of sites over the last 12 months and was becoming increasinging frustrated because theyy were not being spidered beyond the home page by google. I found the reason this week!

Have you noticed that on some ezPublsih sites, the first page visited will have links will have appended something like "?PHPSESSID=b0da36931dc38bd1f04e9a7af8c5b165" ?

Well this is the issue!

From another CMS mailing list I'm on:

"We were having a problem getting our action app content indexed (by google search, not news), so i asked my brother who had just started working at Google. He said:

1. yes, they do index the query string (stuff after the ?).
2. in order to do so, they pay attention to the problem of session variables in the query string by assuming that anything that looks like a session variable is one.
3. the long item ids are thus assumed to be session variables, and aren't getting spidered (i don't know the exact rule, but probably any string longer than 16 chars is going to be assumed to be a session variable).
4. they were trying to improve their algorithm for figuring out what's a session variable and what isn't."

This issue is not a specific ezPublish one but relates to the fact that it uses sessions and a PHP default configuration.

The php configuration item is "session.use_trans_sid"

This needs to be turned off and the session information will dissappear from the link, the site will work fine and google will get beyond your home page.

See http://martin.f2o.org/php/session for details.

Cheers
Bruce
http://www.designit.com.au/

My Blog: http://www.stuffandcontent.com/
Follow me on twitter: http://twitter.com/brucemorrison
Consolidated eZ Publish Feed : http://friendfeed.com/rooms/ez-publish

Tristan Koen

Friday 28 November 2003 12:30:55 am

Brilliant Bruce!

We used to have exactly that problem too.... Google only indexed the landing page.
Our host recently upgraded to PHP4.2.2 and suddenly Google indexed over 150 pages.

Never managed to figure out why until now.

bisk

Friday 28 November 2003 2:38:16 am

I'm having the same problems with sessid's on the first page and google as well.

I guess not anymore, thanks Bruce.

The .htaccess fix works nicely.

-------------------------------
http://www.kookfijn.nl & http://www.magento.be

Simion Ward

Wednesday 17 December 2003 3:43:12 pm

Add the following meta tags to your site.ini.append file:

[SiteSettings]
MetaDataArray[robots]=all
MetaDataArray[robots]=index,follow
MetaDataArray[revisit after]=5 days

Should help with indexing.

Simon
http://www.webrak.co.uk

Simion Ward

Thursday 18 December 2003 2:11:49 am

just a quick note: google indexed 25 megs of my site last night after I made this change.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 18:25:55
Script start
Timing: Jan 18 2025 18:25:55
Module start 'layout'
Timing: Jan 18 2025 18:25:55
Module start 'content'
Timing: Jan 18 2025 18:25:55
Module end 'content'
Timing: Jan 18 2025 18:25:55
Script end

Main resources:

Total runtime0.8232 sec
Peak memory usage4,096.0000 KB
Database Queries67

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0053 589.1797152.6406
Module start 'layout' 0.00530.0028 741.820339.4766
Module start 'content' 0.00810.8135 781.2969626.7422
Module end 'content' 0.82160.0016 1,408.039112.1250
Script end 0.8231  1,420.1641 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00320.3937160.0002
Check MTime0.00130.1602160.0001
Mysql Total
Database connection0.00070.081810.0007
Mysqli_queries0.760792.4151670.0114
Looping result0.00070.0807650.0000
Template Total0.794096.520.3970
Template load0.00200.241020.0010
Template processing0.792096.208120.3960
Template load and register function0.00010.013410.0001
states
state_id_array0.00080.092710.0008
state_identifier_array0.00180.218120.0009
Override
Cache load0.00170.2079410.0000
Sytem overhead
Fetch class attribute can translate value0.00080.092440.0002
Fetch class attribute name0.00120.140660.0002
XML
Image XML parsing0.00090.107140.0002
class_abstraction
Instantiating content class attribute0.00000.001460.0000
General
dbfile0.00090.1149160.0001
String conversion0.00000.001040.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
1content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
5content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
10content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
5content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 23
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs