We moved our repositories to BitBucket!

If you can't login or you can't register to the forums do not rise an issue, instead please write to support (at) l2jserver.com

Check our wiki!

Report server issues here

Forum has been updated to phpBB 3.2, let's see if this fixes some minor bugs we had.

Thank you for visiting http://www.l2jserver.com/

wiki

Post here doubts, ideas, suggestions and support requests about the website and the forums.
Forum rules
READ NOW: L2j Forums Rules of Conduct
Post Reply
User avatar
jurchiks
Posts: 6760
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

wiki

Post by jurchiks » Tue Jul 10, 2012 10:24 am

I've been updating a few pages in the wiki during the past couple of days, and I came across a somewhat dumb wordfilter. I tried putting the word "afford" in the text, but it said "ford found, gtfo!"...
Perhaps the wordfilter should check for whitespace/punctuation around the word?
Because if it's not "/[\s\.,-\_+!@#\$]+ford[\s\.,-\_+!@#\$]+/iu" or smth like that, then it's not the word you're looking for.
http://www.morewords.com/contains/ford/
Shows the words containing "ford" in them.
There are definately other words that could be mis-filtered, and IMHO they shouldn't be getting in the way like this.
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.

User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird » Tue Jul 10, 2012 6:21 pm

too much ford cars loans, insurance and other spam ads on wiki.

you are right, maybe some complex regex fix that.
Image

User avatar
jurchiks
Posts: 6760
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

Re: wiki

Post by jurchiks » Tue Jul 10, 2012 6:43 pm

What, are there that many despite registration and captcha?
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.

User avatar
Zoey76
L2j Inner Circle
L2j Inner Circle
Posts: 6913
Joined: Tue Aug 11, 2009 3:36 am

Re: wiki

Post by Zoey76 » Tue Jul 10, 2012 6:45 pm

ThePhoenixBird wrote:too much ford cars loans, insurance and other spam ads on wiki.

you are right, maybe some complex regex fix that.

Image
Using Eclipse 4.12 - OpenJDK11 - MariaDB 10.4 - L2J Server 2.6.1.0

User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird » Wed Jul 11, 2012 11:31 pm

jurchiks wrote:What, are there that many despite registration and captcha?
hell yeah, actually that crazy regex spam word filter is what stops bots from posting junk in wiki

check how the bots bypass the captchas on the user registration log http://l2jserver.com/wiki/Special:Log/newusers
Image

User avatar
jurchiks
Posts: 6760
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

Re: wiki

Post by jurchiks » Thu Jul 12, 2012 7:57 am

What's your regex like? Is it smth like this?

Code: Select all

$someString = 'audi|mazda|ford';$regex = "/\b($someString)\b/iu";$result = preg_match_all($regex, $yourTextHere, $matches);if ($result){    print_r($matches);}
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.

User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird » Mon Sep 17, 2012 1:38 am

Code: Select all

## Spam Regex$wgSpamRegex =  "/".                        # The "/" is the opening wrapper                "s-e-x|zoofilia|sexyongpin|grusskarte|geburtstagskarten|animalsex|".                "job|bureau|employer|jobster|salary|jobs|bangalore|india|employee|blackberry|".                "freelancing|career|medical|airlines|government|florida|".                "toyota|mercedes|benz|chevrolet|honda|".                "diet|snack|carbohydrates|diets|cholesterol|vitamins|".                "brokers|banks|insurance|fargo|commonwealth|bank|credit|federal|wachovia|".                "sex-with|dogsex|adultchat|adultlive|camsex|sexcam|livesex|sexchat|footjob|".                "chatsex|onlinesex|adultporn|adultvideo|adultweb.|hardcoresex|hardcoreporn|".                "teenporn|xxxporn|lesbiansex|livegirl|livenude|livesex|livevideo|camgirl|pussy|".                "spycam|voyeursex|casino-online|online-casino|kontaktlinsen|cheapest-phone|".                "laser-eye|eye-laser|fuelcellmarket|lasikclinic|cragrats|parishilton|".                "paris-hilton|paris-tape|2large|fuel-dispenser|fueling-dispenser|huojia|".                "jinxinghj|telematicsone|telematiksone|a-mortgage|diamondabrasives|".                "reuterbrook|sex-plugin|sex-zone|lazy-stars|eblja|liuhecai|".                "buy-viagra|-cialis|-levitra|boy-and-girl-kissing|". # These match spammy words                "dirare\.com|".           # This matches dirare.com a spammer's domain name                "overflow\s*:\s*auto|".   # This matches against overflow:auto (regardless of whitespace on either side of the colon)                "height\s*:\s*[0-4]px|".  # This matches against height:0px (most CSS hidden spam) (regardless of whitespace on either side of the colon)                "\\s*a\s*href|".         # This blocks all href links entirely, forcing wiki syntax                "display\s*:\s*none".     # This matches against display:none (regardless of whitespace on either side of the colon)                "/i";                     # The "/" ends the regular expression and the "i" switch which follows makes the test case-insensitive                                          # The "\s" matches whitespace                                          # The "*" is a repeater (zero or more times)                                          # The "\s*" means to look for 0 or more amount of whitespace
Image

User avatar
ThePhoenixBird
L2j Inner Circle
L2j Inner Circle
Posts: 1857
Joined: Fri May 27, 2005 5:11 pm

Re: wiki

Post by ThePhoenixBird » Mon Sep 17, 2012 1:39 am

the regex is based mostly on the words used to spam our wiki
Image

User avatar
jurchiks
Posts: 6760
Joined: Sat Sep 19, 2009 4:16 pm
Location: Eastern Europe

Re: wiki

Post by jurchiks » Mon Sep 17, 2012 8:04 am

1) if you put "sex" and "porn" in there, you can throw out all words that contain those words in them... shortens the regex by a good amount.
2) job/jobs - only the former one is necassary...
3) I'd put just "viagra" instead of "buy-viagra".
4) what is the actual code that parses this pattern? Does it go directly into preg_match*?
And what's with all the "-"?
If you have problems, FIRST TRY SOLVING THEM YOURSELF, and if you get errors, TRY TO ANALYZE THEM, and ONLY if you can't help it, THEN ask here.
Otherwise you will never learn anything if all you do is copy-paste!
Discussion breeds innovation.

Post Reply