This article has been originally published on the old phpBB.cc site on April 10, 2006. As of today, the phpBB 2.0.x visual confirmation remains the same.
The recently released phpBB version 2.0.20 comes with the “Visual Confirmation” enabled by default. Too little, too late. The phpBB CAPTCHA has been successfully broken. Moreover, it is unbearably easy to break. A small desktop application which demonstrates this vulnerability is presented below.
This article is not intended at encouraging spambot development. The described application or the underlying algorithm will not be offered for sale. The author merely hopes to raise the phpBB users’ and developers’ awareness of the need to replace the CAPTCHA with a better one ASAP. He has already been contacted by a member of the phpBB development team after the first draft has been published. The developer refused to identify him/herself, but promised that efforts of solving the problem are being made.
The author provides phpBB customization services. If you want to make your board more secure, you may send him an email.
Initial image to crack:
First, the path to the CAPTCHA image has to be specified. It typically looks like:
http://<domain><script_path> ... <id>
Depending on the cookie settings, it may be succeeded by an additional sid parameter.
Initial image to crack:
The phpBB script produces a Portable Network Graphics image. It performs no check whatsoever for repeated requests, so unlimited number of variations may be produced, allowing for elimination of ambiguous OCR results.
Step 1: Background cleanup
The first step of the algorithm eliminates the background noise, leaving distinct character shapes. This step is trivial due to the critically simple noise nature.
Step 2: Foreground enhancement
The second step allows for cleaner character images by running a 3×3 convolution on the intermideate image matrix.
If performance is critical, this step should be preceded by border detection. The particular order in which the steps are presented in this article has been chosen for better clarity.
Step 3: Border detection
In this step, a bounding box for each character is detected.
As it has been noted already, border detection prior to foreground enhancement would be more efficient.
Step 4: Font matching
Each of the sub-images extracted in the previous step is compared to well-known font images. A character corresponding to the best match is selected for each.
The string is JGRP1O
The algorithm successfully recognizes most characters. One variation may occasionally mistake S for B, another 3 for 8. Combining the two variations, however, eliminates this problem.
As noted previously, unlimited queries with the same id may be made, producing different images for the same string. In addition, two retries are allowed. Repeated queries combined with retries make recognition mistakes negligible.