Forums / Suggestions / Bayesian spam filter for comments, feedbacks form ...

Bayesian spam filter for comments, feedbacks form ...

Author Message

Quoc Huy Nguyen Dinh

Friday 16 April 2010 7:45:54 am

I'm working on an extension to auto filter comments, feedbacks form and other user postings based on a Bayesian spam algorithm.

The filter is getting more accurate as you teach it what is spam and what is ham. So it can be a bit random on first times.

Was wondering if the community finds this interesting. If so I would share it when ready.

Nicolas Pastorino

Monday 19 April 2010 12:07:20 am

Hi !

this is definitely interesting! Do you have technical details on this solution ?

Cheers !

--
Nicolas Pastorino
Director Community - eZ
Member of the Community Project Board

eZ Publish Community on twitter: http://twitter.com/ezcommunity

t : http://twitter.com/jeanvoye
G+ : http://plus.tl/jeanvoye

Quoc Huy Nguyen Dinh

Wednesday 21 April 2010 4:18:37 am

I won't be re-inventing the wheel here.

There are several PHP classes that do this. My plan is to use the following library for the extension:

http://www.phpclasses.org/package/4236-PHP-Detect-spam-in-text-using-Bayesian-techniques.html

It is using a DB table to store data from what the script is learning.

My extension would have:

  • a module that loads up all comments / feedbacks (should be customizable) and allow you to mark them as SPAM or HAM and send for learning.
  • a workflow event that would analyze each post of a comments / feedbacks (customizable) against the base.
  • a way to moderate messages marked as spam/ham
  • the DB table would be modified to allow a different base for each siteaccess.

Sebastiaan van der Vliet

Wednesday 21 April 2010 5:03:23 am

Why not use Akismet? http://code.google.com/p/ezakismet/ & http://akismet.com/

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

Quoc Huy Nguyen Dinh

Thursday 22 April 2010 6:31:14 am

Excellent. That might save me hours of dev time :-D

Will test it, thanks for sharing

eZ debug

Timing: Jan 18 2025 00:59:25
Script start
Timing: Jan 18 2025 00:59:25
Module start 'content'
Timing: Jan 18 2025 00:59:25
Module end 'content'
Timing: Jan 18 2025 00:59:25
Script end

Main resources:

Total runtime0.7011 sec
Peak memory usage4,096.0000 KB
Database Queries203

Timing points:

CheckpointStart (sec)Duration (sec)Memory at start (KB)Memory used (KB)
Script start 0.00000.0058 587.8594180.8125
Module start 'content' 0.00580.5618 768.6719622.3281
Module end 'content' 0.56760.1334 1,391.0000339.1719
Script end 0.7010  1,730.1719 

Time accumulators:

 Accumulator Duration (sec) Duration (%) Count Average (sec)
Ini load
Load cache0.00460.6581210.0002
Check MTime0.00170.2468210.0001
Mysql Total
Database connection0.00090.121610.0009
Mysqli_queries0.613487.48972030.0030
Looping result0.00220.30712010.0000
Template Total0.664494.820.3322
Template load0.00280.401220.0014
Template processing0.661594.354120.3308
Template load and register function0.00020.025510.0002
states
state_id_array0.00130.188810.0013
state_identifier_array0.00190.277020.0010
Override
Cache load0.00230.3352320.0001
Sytem overhead
Fetch class attribute can translate value0.00120.175940.0003
Fetch class attribute name0.00110.163980.0001
XML
Image XML parsing0.00340.483940.0008
class_abstraction
Instantiating content class attribute0.00000.0028100.0000
General
dbfile0.00861.2232420.0002
String conversion0.00000.000730.0000
Note: percentages do not add up to 100% because some accumulators overlap

CSS/JS files loaded with "ezjscPacker" during request:

CacheTypePacklevelSourceFiles
CSS0extension/community/design/community/stylesheets/ext/jquery.autocomplete.css
extension/community_design/design/suncana/stylesheets/scrollbars.css
extension/community_design/design/suncana/stylesheets/tabs.css
extension/community_design/design/suncana/stylesheets/roadmap.css
extension/community_design/design/suncana/stylesheets/content.css
extension/community_design/design/suncana/stylesheets/star-rating.css
extension/community_design/design/suncana/stylesheets/syntax_and_custom_tags.css
extension/community_design/design/suncana/stylesheets/buttons.css
extension/community_design/design/suncana/stylesheets/tweetbox.css
extension/community_design/design/suncana/stylesheets/jquery.fancybox-1.3.4.css
extension/bcsmoothgallery/design/standard/stylesheets/magnific-popup.css
extension/sevenx/design/simple/stylesheets/star_rating.css
extension/sevenx/design/simple/stylesheets/libs/fontawesome/css/all.min.css
extension/sevenx/design/simple/stylesheets/main.v02.css
extension/sevenx/design/simple/stylesheets/main.v02.res.css
JS0extension/ezjscore/design/standard/lib/yui/3.17.2/build/yui/yui-min.js
extension/ezjscore/design/standard/javascript/jquery-3.7.0.min.js
extension/community_design/design/suncana/javascript/jquery.ui.core.min.js
extension/community_design/design/suncana/javascript/jquery.ui.widget.min.js
extension/community_design/design/suncana/javascript/jquery.easing.1.3.js
extension/community_design/design/suncana/javascript/jquery.ui.tabs.js
extension/community_design/design/suncana/javascript/jquery.hoverIntent.min.js
extension/community_design/design/suncana/javascript/jquery.popmenu.js
extension/community_design/design/suncana/javascript/jScrollPane.js
extension/community_design/design/suncana/javascript/jquery.mousewheel.js
extension/community_design/design/suncana/javascript/jquery.cycle.all.js
extension/sevenx/design/simple/javascript/jquery.scrollTo.js
extension/community_design/design/suncana/javascript/jquery.cookie.js
extension/community_design/design/suncana/javascript/ezstarrating_jquery.js
extension/community_design/design/suncana/javascript/jquery.initboxes.js
extension/community_design/design/suncana/javascript/app.js
extension/community_design/design/suncana/javascript/twitterwidget.js
extension/community_design/design/suncana/javascript/community.js
extension/community_design/design/suncana/javascript/roadmap.js
extension/community_design/design/suncana/javascript/ez.js
extension/community_design/design/suncana/javascript/ezshareevents.js
extension/sevenx/design/simple/javascript/main.js

Templates used to render the page:

UsageRequested templateTemplateTemplate loadedEditOverride
1node/view/full.tplfull/forum_topic.tplextension/sevenx/design/simple/override/templates/full/forum_topic.tplEdit templateOverride template
5content/datatype/view/ezimage.tpl<No override>extension/sevenx/design/simple/templates/content/datatype/view/ezimage.tplEdit templateOverride template
5content/datatype/view/ezxmltext.tpl<No override>extension/community_design/design/suncana/templates/content/datatype/view/ezxmltext.tplEdit templateOverride template
6content/datatype/view/ezxmltags/paragraph.tpl<No override>extension/ezwebin/design/ezwebin/templates/content/datatype/view/ezxmltags/paragraph.tplEdit templateOverride template
1content/datatype/view/ezxmltags/link.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/link.tplEdit templateOverride template
1content/datatype/view/ezxmltags/li.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/li.tplEdit templateOverride template
1content/datatype/view/ezxmltags/ul.tpl<No override>design/standard/templates/content/datatype/view/ezxmltags/ul.tplEdit templateOverride template
1pagelayout.tpl<No override>extension/sevenx/design/simple/templates/pagelayout.tplEdit templateOverride template
 Number of times templates used: 21
 Number of unique templates used: 8

Time used to render debug report: 0.0002 secs