PHP Search in nodes escaping special characters

Author Message

Damien MARTIN

Tuesday 01 February 2011 1:26:55 am

Hi there,

I want to do a search using PHP in a tree.

So I use :

$entreprises = eZContentObjectTreeNode::subTreeByNodeID(
    array(
        'ClassFilterType' => 'include',
        'ClassFilterArray' => array('etablissement'),
        'AttributeFilter' => $criteres,
        'SortBy' => array('name', true)
    ),
    2283
);

and where

$criteres[] = array('etablissement/ville', 'like', "*$commune*");

It works very well but I have a problem with accents.

When I'm looking for "Chateau-Thierry" I can't find "Château-Thierry".

But when I search in the back office, I find the same results while typing "chateau" or "château". So I suppose There is something I can do.

But what ?

Could you help me with this ?

Adrien LOCHON

Tuesday 01 February 2011 6:05:16 am

Hi,

I would have used a ID for that. Instead of using a filter on a name, with spaces, accents and things like that, a ID is much better for accuracy.

Or, if you don't want to use any ID or you can't, use a string converter function like this :

function tvReplayImportFiltreBadChars($sContent) {
    $sContent = str_replace(" ", " ", $sContent);
    $sContent = trim($sContent);
    
    $sContent = str_replace("«", "\"", $sContent);
    $sContent = str_replace("»", '"', $sContent);
    $sContent = str_replace("“", "\"", $sContent);
    $sContent = str_replace("”", "\"", $sContent);
    $sContent = str_replace("…", "...", $sContent);
    $sContent = str_replace("´", "'", $sContent);
    $sContent = str_replace("‘", "'", $sContent);
    $sContent = str_replace("’", "'", $sContent);
    $sContent = str_replace("œ", "oe", $sContent);
    $sContent = str_replace(chr(197).chr(34), "oe", $sContent);
    $sContent = str_replace(chr(226) . chr(128) . chr(147), "-", $sContent);
    $sContent = str_replace("Œ", "OE", $sContent);
    $sContent = str_replace("—", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("•", "-", $sContent);
    $sContent = str_replace("Ÿ", "Y", $sContent);
    $sContent = str_replace("ž", "z", $sContent);
    $sContent = str_replace("Ž", "Z", $sContent);
    $sContent = str_replace("Š", "S", $sContent);
    $sContent = str_replace("š", "s", $sContent);
    $sContent = str_replace("›", ">", $sContent);
    $sContent = str_replace("‹", "<", $sContent);
    $sContent = str_replace("€", "E", $sContent);
    $sContent = str_replace(chr(226).chr(130).chr(172) , 'euros', $sContent );
    $sContent = str_replace(chr(195).chr(169), "é", $sContent);
    $sContent = str_replace(chr(195).chr(34), "û", $sContent);
    $sContent = str_replace(chr(195).chr(170), "ê", $sContent);
    $sContent = str_replace(chr(195).chr(168), "è", $sContent);
    $sContent = str_replace(chr(195).chr(32), "à" . chr(32), $sContent);
    $sContent = str_replace(chr(195).chr(162), "â", $sContent);
    $sContent = str_replace(chr(195).chr(69), "à", $sContent);
    $sContent = str_replace(chr(194).chr(34), '"', $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(153), "'", $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(166), "...", $sContent);
    $sContent = str_replace(chr(195).chr(167), "ç", $sContent);
    
    $sContent = trim($sContent);

    $sContent = strtoupper($sContent);
    $aLettres = array(",", ";", ".", ":", "°", "-", "_", "'", '"', "&", " ", "/", "\\", "@", "$", "%", "£", "¤", "µ", "*", "!", "§");
    $sContent = str_replace($aLettres, " ", $sContent);
    $sContent = str_replace(" ", "", $sContent);
    
    $sContent = strtr(utf8_decode($sContent), utf8_decode("ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ"), "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn");
    $sContent = strtolower($sContent);
    $sContent = utf8_encode($sContent);
    
    return $sContent;
}

This will change something like "île-de-fortûné" into "iledefortune", avoiding search errors, at least most of them i suppose... I use it a lot to match movies titles between our portal and our partners.

My opinion is : use ids :)

Damien MARTIN

Tuesday 01 February 2011 6:35:02 am

Thank you very much Adrien, but it is not what i'm looking for.

Since this morning I wrote a little piece of code to do what I want :

if($commune != "tous")
{
    // Conversion de la chaine de caractere pour la passer en ASCII
    $commune = iconv("UTF-8", "ASCII//TRANSLIT", $commune);

    $resultat = array();
    foreach($entreprises as $e)
    {
        $c = eZContentObject::fetchByNodeID($e->NodeID);
        $dm = $c->DataMap();

        $ville = $dm['ville']->DataText;
        $ville = iconv("UTF-8", "ASCII//TRANSLIT", $ville);

        $commune = strtoupper($commune);
        $ville = strtoupper($ville);

        $commune = str_replace("-", " ", $commune);
        $ville = str_replace("-", " ", $ville);

        //echo "<!-- $commune / $ville -->\n";

        unset($dm);
        unset($c);

        if($commune == $ville)
        {
            $resultat[] = $e;
        }
    }

    $entreprises = $resultat;

}

It works well, but I have 1800+ entries in $entreprises, so the server stop serving the page after a few seconds.

This is why I want to use an embed mecanism in ezpublish to search through a tree without having to take care of accents as it does in the administration panel.

@Adrien : You should tke a look at iconv. I discovered it a few weeks ago and it is very usefull.

Thanks.

Franck Magnan

Tuesday 01 February 2011 11:59:52 am

Hello Damien,
I don't think it's an eZ Publish issue but a MySQL issue. Do you use MySQL as database?
I took your code and it's working on my eZ Publish instance. When I search for "Chateau-Thierry", I find my "Château-Thierry" content and my sql query looks like:

SELECT DISTINCT
ezcontentobject.*,
ezcontentobject_tree.*,
ezcontentclass.serialized_name_list as class_serialized_name_list,
ezcontentclass.identifier as class_identifier,
ezcontentclass.is_container as is_container

, ezcontentobject_name.name as name,  ezcontentobject_name.real_translation 


FROM
ezcontentobject_tree,
ezcontentobject,ezcontentclass
, ezcontentobject_name 

, ezcontentobject_attribute a1 


WHERE
ezcontentobject_tree.path_string like '/1/2/%' and  



a1.contentobject_id = ezcontentobject.id AND
a1.contentclassattribute_id = 343 AND
a1.version = ezcontentobject_name.content_version AND 
( a1.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 2 ) )
<
( a1.language_id & 1 )
+ ( ( a1.language_id & 2 ) )
) 
AND                             ( a1.sort_key_string LIKE '%Chateau-Thierry%'  ) AND 
ezcontentclass.version=0 AND
ezcontentobject_tree.node_id != 2 AND
ezcontentobject_tree.contentobject_id = ezcontentobject.id  AND
ezcontentclass.id = ezcontentobject.contentclass_id AND

ezcontentobject.contentclass_id  IN  ( 45 ) AND
ezcontentobject_tree.contentobject_id = ezcontentobject_name.contentobject_id and
ezcontentobject_tree.contentobject_version = ezcontentobject_name.content_version and 
( ezcontentobject_name.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 2 ) )
<
( ezcontentobject_name.language_id & 1 )
+ ( ( ezcontentobject_name.language_id & 2 ) )
) 

AND ezcontentobject_tree.is_invisible = 0
AND ((ezcontentobject.section_id in (1))) 

AND 
ezcontentobject.language_mask & 3 > 0 

ORDER BY ezcontentobject_name.name ASC

You can enable output debug and sql debug to compare your sql query with mine.
If it does not work in your environment, I suppose your MySQL server is not configured like mine, especially the collation. You may be take a look to that post

--
Developer at Open Wide

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.

eZ debug

Timing: Jan 18 2025 02:02:35
Script start
Timing: Jan 18 2025 02:02:35
Module start 'layout'
Timing: Jan 18 2025 02:02:35
Module start 'content'
Timing: Jan 18 2025 02:02:36
Module end 'content'
Timing: Jan 18 2025 02:02:36
Script end

Main resources:

Total runtime0.8319 sec
Peak memory usage4,096.0000 KB
Database Queries64

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0072 594.3047152.6406
Module start 'layout' 0.00720.0033 746.945339.4844
Module start 'content' 0.01040.8199 786.4297624.0781
Module end 'content' 0.83030.0015 1,410.507820.1250
Script end 0.8319  1,430.6328 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00480.5734160.0003
Check MTime0.00190.2299160.0001
Mysql Total
Database connection0.00070.089710.0007
Mysqli_queries0.768192.3323640.0120
Looping result0.00070.0787620.0000
Template Total0.799396.120.3996
Template load0.00240.286020.0012
Template processing0.796995.788620.3984
Template load and register function0.00010.014910.0001
states
state_id_array0.00080.094710.0008
state_identifier_array0.00100.117420.0005
Override
Cache load0.00200.2421410.0000
Sytem overhead
Fetch class attribute can translate value0.00090.113830.0003
Fetch class attribute name0.00110.136760.0002
XML
Image XML parsing0.00100.115430.0003
class_abstraction
Instantiating content class attribute0.00000.001770.0000
General
dbfile0.00180.2144210.0001
String conversion0.00000.001040.0000
Note: percentages do not add up to 100% because some accumulators overlap

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
3content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
4content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
10content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
5content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
2content/datatype/view/ezxmltags/link.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/link.tplEdit templateOverride template
2content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1print_pagelayout.tpl<No override>extension/community/design/community/templates/print_pagelayout.tplEdit templateOverride template
 Number of times templates used: 28
 Number of unique templates used: 8

Time used to render debug report: 0.0002 secs