Can't get my robots.txt file recognised

Author Message

Tony Coe

Thursday 24 August 2006 7:16:55 am

Apologies if this has been covered elsewhere, but have found a couple of rewriting articles and tried what was suggested in them, but without success.

I am configuring a copy of ez publish to run across several domains and have it all running ok, but can't get my robots.txt file to be picked up anywhere. I think the problem may be caused by my path prefix, but can't work out how to get around it.

The domain in question is set to use the path prefix of /noni_horses/

I originally added the line
RewriteRule ^/robots.txt - [L]
to the rewrite section of my virtual host settings.
When this didn't work and thinking the prefix might be causing the problem, I changed it to
RewriteRule ^/noni_horses/robots.txt - [L]
Again with no joy.

I have copies of my robots.txt file both in the root of my site and have also tried putting a copy in a subfolder /noni_horses/

I always just get an error kernel 20.

I'll be the first to admit that I'm not entirely sure what I'm doing here - I have a pretty good working knowledge of php, but very limited experience of apache. Please can someone tell me what I'm doing wrong? Help!

Marcin Drozd

Thursday 24 August 2006 7:31:32 am

Hi
If U have
RewriteRule !\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf$ index.php
try to add:

|robots\.txt

and perhaps
<FilesMatch "(index\.php|<b>robots\.txt|</b>\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">

http://ez-publish.pl

Claudia Kosny

Thursday 24 August 2006 12:09:59 pm

Hello Tony,

Your robots.txt is not supposed to be picked up by EZ so your first rewrite rule
^/robots.txt - [L]
should be ok.

The search engine spiders will pick it up only at the root of your server, no matter whre you installed EZ. If you have installed EZ in a subdirectory /noni_horses of your server docroot, the robots.txt still needs to go to the docroot, you have to consider the subfolder in the settings of your robots.txt.

So what you have to achieve is that your robots.txt is displayed if you call up http://serverroot/robots.txt.

This of course changes if have virtual hosts settings that make sure that you can call up your server with http://www.noni_horses.<whatever tld you use >
In this case the spiders do not see that you are using a subdirectory.

Getting an kernel 20 error whenh trying to call up the robots.txt via EZ is expected - after all it is not an module or something like this.

Greetings from Luxembourg

Claudia

Tony Coe

Thursday 31 August 2006 2:47:17 am

Hi Claudia/Marcin,

I just can't get it to work!
I've tried adding the following below ' RewriteEngine On' in my vhost.conf file:
RewriteRule !(^/design|^/var/.*/storage|^/var/storage|^/var/.*/cache|^/var/cache|^/noni_horses/robots\.txt|^/extension/.*/design|^/kernel/setup/packages|^/packages|^/share/icons).*\.(gif|css|jpg|png|jar|js|ico|pdf|swf)$ /index.php

I also tried
RewriteRule !(^/design|^/var/.*/storage|^/var/storage|^/var/.*/cache|^/var/cache|^/robots\.txt|^/extension/.*/design|^/kernel/setup/packages|^/packages|^/share/icons).*\.(gif|css|jpg|png|jar|js|ico|pdf|swf)$ /index.php

and also
RewriteRule ^/robots.txt - [L]
and
RewriteRule ^/noni_horses/robots.txt - [L]

I also tried changing AllowOverride None to AllowOverride All for the virtual host and tried putting the same in the .htaccess file, with no joy.

I also tried adding:<FilesMatch "(index\.php|robots\.txt|\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">
order allow,deny
allow from all
</FilesMatch>

with no difference
and also
<FilesMatch "(index\.php|\noni_horses\robots\.txt|\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">
order allow,deny
allow from all
</FilesMatch>

I know I'm probably just being stupid and not understanding how the rewrites work here, but I really can't work out where I'm going wrong...

Incidentally, I notice that there doesn't seem to be a valid robots.txt on ez.no!
(at least trying to access ez.no/robots.txt gets a kernel 20 error, same as I'm getting....)

Claudia Kosny

Thursday 31 August 2006 9:29:34 am

Hello Tony

Unfortunately I made a mistake in my previous posting which might well be the cause of the problem you still have.
The rewrite rule must _not_ have a leading slash is this is part of the directory structure and will be stripped by the rewrite engine. So just remove the slash and it should work fine.
Forget about the path noni_horses as the robots.txt must be in the document root of your virtual host.

If you still have problems, please check the rewrite.log - there you can see which rewrite rules are applied to which file.

You are right that ez.no does not seem to have robots.txt. Althoughthey might check the user agent in their htaccess/virtual host and only deliver it for certain spiders, not for web browsers. On the other hand I think the main reason for not having a robots.txt is that you don't need one if you have only a EZ installation on your server. Provided you use the htaccess or virtual host settings as recommended during installation, there is no need to forbid any folder to a searchengine - anything worth protecting is protected by htaccess or a required login

To be on the safe side here the rules that work for me:

php_value allow_call_time_pass_reference 0

<FilesMatch ".">
order allow,deny
deny from all
</FilesMatch>

<FilesMatch "(^robots\.txt$)|(index\.php|\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf)$">
order allow,deny
allow from all
</FilesMatch>

RewriteEngine On

RewriteRule ^robots\.txt$ robots.txt [L]
RewriteRule !\.(gif|jpe?g|png|css|js|html)|var(.+)storage.pdf(.+)\.pdf$ index.php

DirectoryIndex index.php

Greetings from Luxembourg

Claudia

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 19 2025 03:15:30
Script start
Timing: Jan 19 2025 03:15:30
Module start 'layout'
Timing: Jan 19 2025 03:15:30
Module start 'content'
Timing: Jan 19 2025 03:15:31
Module end 'content'
Timing: Jan 19 2025 03:15:31
Script end

Main resources:

Total runtime1.1453 sec
Peak memory usage4,096.0000 KB
Database Queries65

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0060 589.2969152.6406
Module start 'layout' 0.00600.0029 741.937539.4766
Module start 'content' 0.00881.1348 781.4141581.2891
Module end 'content' 1.14360.0016 1,362.703116.1406
Script end 1.1452  1,378.8438 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00350.3040160.0002
Check MTime0.00140.1225160.0001
Mysql Total
Database connection0.00090.079410.0009
Mysqli_queries1.077594.0852650.0166
Looping result0.00070.0642630.0000
Template Total1.111897.120.5559
Template load0.00230.201920.0012
Template processing1.109596.877820.5548
Template load and register function0.00040.035410.0004
states
state_id_array0.00120.105210.0012
state_identifier_array0.00210.182320.0010
Override
Cache load0.00210.1840710.0000
Sytem overhead
Fetch class attribute can translate value0.00070.060830.0002
Fetch class attribute name0.00160.139550.0003
XML
Image XML parsing0.00040.033430.0001
class_abstraction
Instantiating content class attribute0.00000.004050.0000
General
dbfile0.00240.2110100.0002
String conversion0.00000.000740.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
5content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
17content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
12content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
2content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 38
 Number of unique templates used: 6

Time used to render debug report: 0.0001 secs