Forums / Developer / PHP Search in nodes escaping special characters

PHP Search in nodes escaping special characters

Author Message

Damien MARTIN

Tuesday 01 February 2011 1:26:55 am

Hi there,

I want to do a search using PHP in a tree.

So I use :

$entreprises = eZContentObjectTreeNode::subTreeByNodeID(
    array(
        'ClassFilterType' => 'include',
        'ClassFilterArray' => array('etablissement'),
        'AttributeFilter' => $criteres,
        'SortBy' => array('name', true)
    ),
    2283
);

and where

$criteres[] = array('etablissement/ville', 'like', "*$commune*");

It works very well but I have a problem with accents.

When I'm looking for "Chateau-Thierry" I can't find "Château-Thierry".

But when I search in the back office, I find the same results while typing "chateau" or "château". So I suppose There is something I can do.

But what ?

Could you help me with this ?

Adrien LOCHON

Tuesday 01 February 2011 6:05:16 am

Hi,

I would have used a ID for that. Instead of using a filter on a name, with spaces, accents and things like that, a ID is much better for accuracy.

Or, if you don't want to use any ID or you can't, use a string converter function like this :

function tvReplayImportFiltreBadChars($sContent) {
    $sContent = str_replace(" ", " ", $sContent);
    $sContent = trim($sContent);
    
    $sContent = str_replace("«", "\"", $sContent);
    $sContent = str_replace("»", '"', $sContent);
    $sContent = str_replace("“", "\"", $sContent);
    $sContent = str_replace("”", "\"", $sContent);
    $sContent = str_replace("…", "...", $sContent);
    $sContent = str_replace("´", "'", $sContent);
    $sContent = str_replace("‘", "'", $sContent);
    $sContent = str_replace("’", "'", $sContent);
    $sContent = str_replace("œ", "oe", $sContent);
    $sContent = str_replace(chr(197).chr(34), "oe", $sContent);
    $sContent = str_replace(chr(226) . chr(128) . chr(147), "-", $sContent);
    $sContent = str_replace("Œ", "OE", $sContent);
    $sContent = str_replace("—", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("•", "-", $sContent);
    $sContent = str_replace("Ÿ", "Y", $sContent);
    $sContent = str_replace("ž", "z", $sContent);
    $sContent = str_replace("Ž", "Z", $sContent);
    $sContent = str_replace("Š", "S", $sContent);
    $sContent = str_replace("š", "s", $sContent);
    $sContent = str_replace("›", ">", $sContent);
    $sContent = str_replace("‹", "<", $sContent);
    $sContent = str_replace("€", "E", $sContent);
    $sContent = str_replace(chr(226).chr(130).chr(172) , 'euros', $sContent );
    $sContent = str_replace(chr(195).chr(169), "é", $sContent);
    $sContent = str_replace(chr(195).chr(34), "û", $sContent);
    $sContent = str_replace(chr(195).chr(170), "ê", $sContent);
    $sContent = str_replace(chr(195).chr(168), "è", $sContent);
    $sContent = str_replace(chr(195).chr(32), "à" . chr(32), $sContent);
    $sContent = str_replace(chr(195).chr(162), "â", $sContent);
    $sContent = str_replace(chr(195).chr(69), "à", $sContent);
    $sContent = str_replace(chr(194).chr(34), '"', $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(153), "'", $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(166), "...", $sContent);
    $sContent = str_replace(chr(195).chr(167), "ç", $sContent);
    
    $sContent = trim($sContent);

    $sContent = strtoupper($sContent);
    $aLettres = array(",", ";", ".", ":", "°", "-", "_", "'", '"', "&", " ", "/", "\\", "@", "$", "%", "£", "¤", "µ", "*", "!", "§");
    $sContent = str_replace($aLettres, " ", $sContent);
    $sContent = str_replace(" ", "", $sContent);
    
    $sContent = strtr(utf8_decode($sContent), utf8_decode("ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ"), "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn");
    $sContent = strtolower($sContent);
    $sContent = utf8_encode($sContent);
    
    return $sContent;
}

This will change something like "île-de-fortûné" into "iledefortune", avoiding search errors, at least most of them i suppose... I use it a lot to match movies titles between our portal and our partners.

My opinion is : use ids :)

Damien MARTIN

Tuesday 01 February 2011 6:35:02 am

Thank you very much Adrien, but it is not what i'm looking for.

Since this morning I wrote a little piece of code to do what I want :

if($commune != "tous")
{
    // Conversion de la chaine de caractere pour la passer en ASCII
    $commune = iconv("UTF-8", "ASCII//TRANSLIT", $commune);

    $resultat = array();
    foreach($entreprises as $e)
    {
        $c = eZContentObject::fetchByNodeID($e->NodeID);
        $dm = $c->DataMap();

        $ville = $dm['ville']->DataText;
        $ville = iconv("UTF-8", "ASCII//TRANSLIT", $ville);

        $commune = strtoupper($commune);
        $ville = strtoupper($ville);

        $commune = str_replace("-", " ", $commune);
        $ville = str_replace("-", " ", $ville);

        //echo "<!-- $commune / $ville -->\n";

        unset($dm);
        unset($c);

        if($commune == $ville)
        {
            $resultat[] = $e;
        }
    }

    $entreprises = $resultat;

}

It works well, but I have 1800+ entries in $entreprises, so the server stop serving the page after a few seconds.

This is why I want to use an embed mecanism in ezpublish to search through a tree without having to take care of accents as it does in the administration panel.

@Adrien : You should tke a look at iconv. I discovered it a few weeks ago and it is very usefull.

Thanks.

Franck Magnan

Tuesday 01 February 2011 11:59:52 am

Hello Damien,
I don't think it's an eZ Publish issue but a MySQL issue. Do you use MySQL as database?
I took your code and it's working on my eZ Publish instance. When I search for "Chateau-Thierry", I find my "Château-Thierry" content and my sql query looks like:

SELECT DISTINCT
ezcontentobject.*,
ezcontentobject_tree.*,
ezcontentclass.serialized_name_list as class_serialized_name_list,
ezcontentclass.identifier as class_identifier,
ezcontentclass.is_container as is_container

, ezcontentobject_name.name as name,  ezcontentobject_name.real_translation 


FROM
ezcontentobject_tree,
ezcontentobject,ezcontentclass
, ezcontentobject_name 

, ezcontentobject_attribute a1 


WHERE
ezcontentobject_tree.path_string like '/1/2/%' and  



a1.contentobject_id = ezcontentobject.id AND
a1.contentclassattribute_id = 343 AND
a1.version = ezcontentobject_name.content_version AND 
( a1.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 2 ) )
<
( a1.language_id & 1 )
+ ( ( a1.language_id & 2 ) )
) 
AND                             ( a1.sort_key_string LIKE '%Chateau-Thierry%'  ) AND 
ezcontentclass.version=0 AND
ezcontentobject_tree.node_id != 2 AND
ezcontentobject_tree.contentobject_id = ezcontentobject.id  AND
ezcontentclass.id = ezcontentobject.contentclass_id AND

ezcontentobject.contentclass_id  IN  ( 45 ) AND
ezcontentobject_tree.contentobject_id = ezcontentobject_name.contentobject_id and
ezcontentobject_tree.contentobject_version = ezcontentobject_name.content_version and 
( ezcontentobject_name.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 2 ) )
<
( ezcontentobject_name.language_id & 1 )
+ ( ( ezcontentobject_name.language_id & 2 ) )
) 

AND ezcontentobject_tree.is_invisible = 0
AND ((ezcontentobject.section_id in (1))) 

AND 
ezcontentobject.language_mask & 3 > 0 

ORDER BY ezcontentobject_name.name ASC

You can enable output debug and sql debug to compare your sql query with mine.
If it does not work in your environment, I suppose your MySQL server is not configured like mine, especially the collation. You may be take a look to that post

--
Developer at Open Wide

eZ debug

Timing: Jan 17 2025 23:40:59
Script start
Timing: Jan 17 2025 23:40:59
Module start 'content'
Timing: Jan 17 2025 23:41:00
Module end 'content'
Timing: Jan 17 2025 23:41:00
Script end

Main resources:

Total runtime1.0849 sec
Peak memory usage4,096.0000 KB
Database Queries201

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0060 589.2734180.8203
Module start 'content' 0.00600.9670 770.0938623.2500
Module end 'content' 0.97300.1118 1,393.3438345.3672
Script end 1.0848  1,738.7109 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00390.3627210.0002
Check MTime0.00150.1353210.0001
Mysql Total
Database connection0.00090.083010.0009
Mysqli_queries1.008792.97592010.0050
Looping result0.00200.18211990.0000
Template Total1.059597.720.5298
Template load0.00240.222920.0012
Template processing1.057197.433520.5285
Template load and register function0.00010.008610.0001
states
state_id_array0.00080.069210.0008
state_identifier_array0.00070.064320.0003
Override
Cache load0.00190.1779410.0000
Sytem overhead
Fetch class attribute can translate value0.00140.131640.0004
Fetch class attribute name0.00130.117960.0002
XML
Image XML parsing0.00130.123840.0003
class_abstraction
Instantiating content class attribute0.00000.003870.0000
General
dbfile0.00360.3279320.0001
String conversion0.00000.000630.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
3content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
4content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
10content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
5content/datatype/view/ezxmltags/literal.tpl<No override>extension/community/design/standard/templates/content/datatype/view/ezxmltags/literal.tplEdit templateOverride template
2content/datatype/view/ezxmltags/link.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/link.tplEdit templateOverride template
2content/datatype/view/ezxmltags/line.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/line.tplEdit templateOverride template
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 28
 Number of unique templates used: 8

Time used to render debug report: 0.0002 secs