root/releases/0.9/lib/validateurlsyntax.php

Revision 269, 22.4 kB (checked in by ben, 3 years ago)

--

  • Property svn:eol-style set to native
Line 
1 <?php
2 /*
3 About validateUrlSyntax():
4     This function will verify if a http URL is formatted properly, returning
5     either with true or false.
6     
7     I used rfc #2396 URI: Generic Syntax as my guide when creating the
8     regular expression. For all the details see the comments below.
9
10
11 Usage:
12     validateUrlSyntax( url_to_check[, options])
13
14     url_to_check - string - The url to check
15     
16     options - string - A optional string of options to set which parts of
17             the url are required, optional, or not allowed. Each option
18             must be followed by a "+" for required, "?" for optional, or
19             "-" for not allowed.
20
21             s - Scheme. Allows "+?-", defaults to "s?"
22                 H - http:// Allows "+?-", defaults to "H?"
23                 S - https:// (SSL). Allows "+?-", defaults to "S?"
24                 E - mailto: (email). Allows "+?-", defaults to "E-"
25                 F - ftp:// Allows "+?-", defaults to "F-"
26                     Dependant on scheme being enabled
27             u - User section. Allows "+?-", defaults to "u?"
28                 P - Password in user section. Allows "+?-", defaults to "P?"
29                     Dependant on user section being enabled
30             a - Address (ip or domain). Allows "+?-", defaults to "a+"
31                 I - Ip address. Allows "+?-", defaults to "I?"
32                     If I+, then domains are disabled
33                     If I-, then domains are required
34                     Dependant on address being enabled
35             p - Port number. Allows "+?-", defaults to "p?"
36             f - File path. Allows "+?-", defaults to "f?"
37             q - Query section. Allows "+?-", defaults to "q?"
38             r - Fragment (anchor). Allows "+?-", defaults to "r?"
39
40     Paste the funtion code, or include_once() this template at the top of the page
41     you wish to use this function.
42
43
44 Examples:
45     validateUrlSyntax('http://george@www.cnn.com/#top')
46
47     validateUrlSyntax('https://games.yahoo.com:8080/board/chess.htm?move=true')
48
49     validateUrlSyntax('http://www.hotmail.com/', 's+u-I-p-q-r-')
50
51     validateUrlSyntax('/directory/file.php#top', 's-u-a-p-f+')
52
53
54     if (validateUrlSyntax('http://www.canowhoopass.com/', 'u-'))
55     {
56         echo 'URL SYNTAX IS VERIFIED';
57     } else {
58         echo 'URL SYNTAX IS ILLEGAL';
59     }
60
61
62 Last Edited:
63     December 15th 2004
64
65
66 Changelog:
67     December 15th 2004
68       -Added new TLD's - .jobs, .mobi, .post and .travel. They are official, but not yet active.
69
70     August 31th 2004
71       -Fixed bug allowing empty username even when it was required
72       -Changed and added a few options to add extra schemes
73       -Added mailto: ftp:// and http:// options
74       -https option was 'l' now it is 'S' (capital)
75       -Added password option. Now passwords can be disabled while usernames are ok (for email)
76       -IP Address option was 'i' now it is 'I' (capital)
77       -Options are now case sensitive
78       -Added validateEmailSyntax() and validateFtpSyntax() functions below<br>
79
80     August 27th, 2004
81       -IP group range is more specific. Used to allow 0-299. Now it is 0-255
82       -Port range more specific. Used to allow 0-69999. Now it is 0-65535<br>
83       -Fixed bug disallowing 'i-' option.<br>
84       -Changed license to GPL
85
86     July 8th, 2004
87       -Fixed bug disallowing 'l-' option. Thanks Dr. Cheap
88
89     June 15, 2004
90       -Added options parameter to make it easier for people to plug the function in
91        without needed to rework the code.
92       -Split the example application away from the function
93
94     June 1, 2004
95       -Complete rewrite
96       -Now more modular
97         -Easier to disable sections
98         -Easier to port to other languages
99         -Easier to port to verify email addresses
100       -Uses only simple regular expressions so it is more portable
101       -Follows RFC closer for domain names. Some "play" domains may break
102       -Renamed from 'verifyUrl()' to 'validateUrlSyntax()'
103       -Removed extra code which added 'http://' and trailing '/' if it was missing
104         -That code was better suited for a massaging function, not verifying
105       -Bug fixes:
106         -Now splits up and forces '/path?query#fragment' order
107         -No longer requires a path when using a query or fragment
108
109     August 29, 2003
110       -Allowed port numbers above 9999. Now allows up to 69999
111             
112     Sometime, 2002
113       -Added new top level domains
114         -aero, coop, museum, name, info, biz, pro
115
116     October 5, 2000
117       -First Version
118
119
120 Intentional Limitations:
121     -Does not verify url actually exists. Only validates the syntax
122     -Strictly follows the RFC standards. Some urls exist in the wild which will
123      not validate. Including ones with square brackets in the query section '[]'
124
125
126 Known Problems:
127     -None at this time
128
129
130 Author(s):
131     Rod Apeldoorn - rod(at)canowhoopass(dot)com
132
133     
134 Homepage:
135     http://www.canowhoopass.com/
136
137
138 Thanks!:
139     -WEAV -Several members of Weav helped to test - http://weav.bc.ca/
140     -There were also a number of emails from other developers expressing
141      thanks and suggestions. It is nice to be appreciated. Thanks!
142
143
144 License:
145     Copyright 2004, Rod Apeldoorn
146     
147     This program is free software; you can redistribute it and/or modify
148     it under the terms of the GNU General Public License as published by
149     the Free Software Foundation; either version 2 of the License, or (at
150     your option) any later version.
151
152     This program is distributed in the hope that it will be useful, but
153     WITHOUT ANY WARRANTY; without even the implied warranty of
154     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
155     General Public License for more details.
156
157     You should have received a copy of the GNU General Public License along
158     with this program; if not, write to the Free Software Foundation, Inc.,
159     59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
160
161     To view the license online, go to: http://www.gnu.org/copyleft/gpl.html
162
163
164 Alternate Commercial Licenses:
165     For information in regards to alternate licensing, contact me.
166 */
167
168
169 // BEGINNING OF validateUrlSyntax() function
170 function validateUrlSyntax( $urladdr, $options="" ){
171
172     // Force Options parameter to be lower case
173     // DISABLED PERMAMENTLY - OK to remove from code
174     //    $options = strtolower($options);
175
176     // Check Options Parameter
177     if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
178     {
179         trigger_error("Options attribute malformed", E_USER_ERROR);
180     }
181
182     // Set Options Array, set defaults if options are not specified
183     // Scheme
184     if (strpos( $options, 's') === false) $aOptions['s'] = '?';
185     else $aOptions['s'] = substr( $options, strpos( $options, 's') + 1, 1);
186     // http://
187     if (strpos( $options, 'H') === false) $aOptions['H'] = '?';
188     else $aOptions['H'] = substr( $options, strpos( $options, 'H') + 1, 1);
189     // https:// (SSL)
190     if (strpos( $options, 'S') === false) $aOptions['S'] = '?';
191     else $aOptions['S'] = substr( $options, strpos( $options, 'S') + 1, 1);
192     // mailto: (email)
193     if (strpos( $options, 'E') === false) $aOptions['E'] = '-';
194     else $aOptions['E'] = substr( $options, strpos( $options, 'E') + 1, 1);
195     // ftp://
196     if (strpos( $options, 'F') === false) $aOptions['F'] = '-';
197     else $aOptions['F'] = substr( $options, strpos( $options, 'F') + 1, 1);
198     // User section
199     if (strpos( $options, 'u') === false) $aOptions['u'] = '?';
200     else $aOptions['u'] = substr( $options, strpos( $options, 'u') + 1, 1);
201     // Password in user section
202     if (strpos( $options, 'P') === false) $aOptions['P'] = '?';
203     else $aOptions['P'] = substr( $options, strpos( $options, 'P') + 1, 1);
204     // Address Section
205     if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
206     else $aOptions['a'] = substr( $options, strpos( $options, 'a') + 1, 1);
207     // IP Address in address section
208     if (strpos( $options, 'I') === false) $aOptions['I'] = '?';
209     else $aOptions['I'] = substr( $options, strpos( $options, 'I') + 1, 1);
210     // Port number
211     if (strpos( $options, 'p') === false) $aOptions['p'] = '?';
212     else $aOptions['p'] = substr( $options, strpos( $options, 'p') + 1, 1);
213     // File Path
214     if (strpos( $options, 'f') === false) $aOptions['f'] = '?';
215     else $aOptions['f'] = substr( $options, strpos( $options, 'f') + 1, 1);
216     // Query Section
217     if (strpos( $options, 'q') === false) $aOptions['q'] = '?';
218     else $aOptions['q'] = substr( $options, strpos( $options, 'q') + 1, 1);
219     // Fragment (Anchor)
220     if (strpos( $options, 'r') === false) $aOptions['r'] = '?';
221     else $aOptions['r'] = substr( $options, strpos( $options, 'r') + 1, 1);
222
223
224     // Loop through options array, to search for and replace "-" to "{0}" and "+" to ""
225     foreach($aOptions as $key => $value)
226     {
227         if ($value == '-')
228         {
229             $aOptions[$key] = '{0}';
230         }
231         if ($value == '+')
232         {
233             $aOptions[$key] = '';
234         }
235     }
236     
237     // DEBUGGING - Unescape following line to display to screen current option values
238     // echo '<pre>'; print_r($aOptions); echo '</pre>';
239
240
241     // Preset Allowed Characters
242     $alphanum    = '[a-zA-Z0-9]'// Alpha Numeric
243     $unreserved  = '[a-zA-Z0-9_.!~*' . '\'' . '()-]';
244     $escaped     = '(%[0-9a-fA-F]{2})'; // Escape sequence - In Hex - %6d would be a 'm'
245     $reserved    = '[;/?:@&=+$,]'; // Special characters in the URI
246     
247     // Beginning Regular Expression
248                        // Scheme - Allows for 'http://', 'https://', 'mailto:', or 'ftp://'
249     $scheme            = '(';
250     if     ($aOptions['H'] === '') { $scheme .= 'http://'; }
251     elseif ($aOptions['S'] === '') { $scheme .= 'https://'; }
252     elseif ($aOptions['E'] === '') { $scheme .= 'mailto:'; }
253     elseif ($aOptions['F'] === '') { $scheme .= 'ftp://'; }
254     else
255     {
256         if ($aOptions['H'] === '?') { $scheme .= '|(http://)'; }
257         if ($aOptions['S'] === '?') { $scheme .= '|(https://)'; }
258         if ($aOptions['E'] === '?') { $scheme .= '|(mailto:)'; }
259         if ($aOptions['F'] === '?') { $scheme .= '|(ftp://)'; }
260         $scheme = str_replace('(|', '(', $scheme); // fix first pipe
261     }
262     $scheme            .= ')' . $aOptions['s'];
263     // End setting scheme
264     
265                        // User Info - Allows for 'username@' or 'username:password@'. Note: contrary to rfc, I removed ':' from username section, allowing it only in password.
266                        //   /---------------- Username -----------------------\  /-------------------------------- Password ------------------------------\
267     $userinfo          = '((' . $unreserved . '|' . $escaped . '|[;&=+$,]' . ')+(:(' . $unreserved . '|' . $escaped . '|[;:&=+$,]' . ')+)' . $aOptions['P'] . '@)' . $aOptions['u'];
268     
269                        // IP ADDRESS - Allows 0.0.0.0 to 255.255.255.255
270     $ipaddress         = '((((2(([0-4][0-9])|(5[0-5])))|([01]?[0-9]?[0-9]))\.){3}((2(([0-4][0-9])|(5[0-5])))|([01]?[0-9]?[0-9])))';
271     
272                        // Tertiary Domain(s) - Optional - Multi - Although some sites may use other characters, the RFC says tertiary domains have the same naming restrictions as second level domains
273     $domain_tertiary   = '(' . $alphanum . '(([a-zA-Z0-9-]{0,62})' . $alphanum . ')?\.)*';
274
275                        // Second Level Domain - Required - First and last characters must be Alpha-numeric. Hyphens are allowed inside.
276     $domain_secondary  = '(' . $alphanum . '(([a-zA-Z0-9-]{0,62})' . $alphanum . ')?\.)';
277     
278     /* // This regex is disabled on purpose in favour of the more exact version below
279                        // Top Level Domain - First character must be Alpha. Last character must be AlphaNumeric. Hyphens are allowed inside.
280     $domain_toplevel   = '([a-zA-Z](([a-zA-Z0-9-]*)[a-zA-Z0-9])?)';
281     */
282     
283                        // Top Level Domain - Required - Domain List Current As Of December 2004. Use above escaped line to be forgiving of possible future TLD's
284     $domain_toplevel   = '(aero|biz|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|post|pro|travel|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|az|ax|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)';
285     
286
287                        // Address can be IP address or Domain
288     if ($aOptions['I'] === '{0}') {       // IP Address Not Allowed
289         $address       = '(' . $domain_tertiary . $domain_secondary . $domain_toplevel . ')';
290     } elseif ($aOptions['I'] === '') {  // IP Address Required
291         $address       = '(' . $ipaddress . ')';
292     } else {                            // IP Address Optional
293         $address       = '((' . $ipaddress . ')|(' . $domain_tertiary . $domain_secondary . $domain_toplevel . '))';
294     }
295     $address = $address . $aOptions['a'];
296     
297                        // Port Number - :80 or :8080 or :65534 Allows range of :0 to :65535
298                        //    (0-59999)         |(60000-64999)   |(65000-65499)    |(65500-65529)  |(65530-65535)
299     $port_number       = '(:(([0-5]?[0-9]{1,4})|(6[0-4][0-9]{3})|(65[0-4][0-9]{2})|(655[0-2][0-9])|(6553[0-5])))' . $aOptions['p'];
300     
301                        // Path - Can be as simple as '/' or have multiple folders and filenames
302     $path              = '(/?((;)?(' . $unreserved . '|' . $escaped . '|' . '[:@&=+$,]' . ')+(/)?)*)' . $aOptions['f'];
303     
304                        // Query Section - Accepts ?var1=value1&var2=value2 or ?2393,1221 and much more
305     $querystring       = '(\?(' . $reserved . '|' . $unreserved . '|' . $escaped . ')*)' . $aOptions['q'];
306     
307                        // Fragment Section - Accepts anchors such as #top
308     $fragment          = '(#(' . $reserved . '|' . $unreserved . '|' . $escaped . ')*)' . $aOptions['r'];
309     
310     
311     // Building Regular Expression
312     $regexp = '^' . $scheme . $userinfo . $address . $port_number . $path . $querystring . $fragment . '$';
313     
314     // DEBUGGING - Uncomment Line Below To Display The Regular Expression Built
315     // echo '<pre>' . htmlentities(wordwrap($regexp,70,"\n",1)) . '</pre>';
316
317     // Running the regular expression
318     if (eregi( $regexp, $urladdr ))
319     {
320         return true; // The domain passed
321     }
322     else
323     {
324         return false; // The domain didn't pass the expression
325     }
326
327 } // END Function validateUrlSyntax()
328
329
330
331 /*
332 About ValidateEmailSyntax():
333     This function uses the ValidateUrlSyntax() function to easily check the
334     syntax of an email address. It accepts the same options as ValidateURLSyntax
335     but defaults them for email addresses.
336
337
338 Usage:
339     validateEmailSyntax( url_to_check[, options])
340
341     url_to_check - string - The url to check
342     
343     options - string - A optional string of options to set which parts of
344             the url are required, optional, or not allowed. Each option
345             must be followed by a "+" for required, "?" for optional, or
346             "-" for not allowed. See ValidateUrlSyntax() docs for option list.
347
348     The default options are changed to:
349         s-H-S-E+F-u+P-a+I-p-f-q-r-
350
351     This only allows an address of "name@domain".
352
353 Examples:
354     validateEmailSyntax('george@fakemail.com')
355     validateEmailSyntax('mailto:george@fakemail.com', 's+')
356     validateEmailSyntax('george@fakemail.com?subject=Hi%20George', 'q?')
357     validateEmailSyntax('george@212.198.33.12', 'I?')
358     
359
360
361 Author(s):
362     Rod Apeldoorn - rod(at)canowhoopass(dot)com
363
364     
365 Homepage:
366     http://www.canowhoopass.com/
367
368
369 License:
370     Copyright 2004 - Rod Apeldoorn
371
372     Released under same license as validateUrlSyntax(). For details, contact me.
373
374
375 */
376
377 function validateEmailSyntax( $emailaddr, $options="" ){
378
379     // Check Options Parameter
380     if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
381     {
382         trigger_error("Options attribute malformed", E_USER_ERROR);
383     }
384
385     // Set Options Array, set defaults if options are not specified
386     // Scheme
387     if (strpos( $options, 's') === false) $aOptions['s'] = '-';
388     else $aOptions['s'] = substr( $options, strpos( $options, 's') + 1, 1);
389     // http://
390     if (strpos( $options, 'H') === false) $aOptions['H'] = '-';
391     else $aOptions['H'] = substr( $options, strpos( $options, 'H') + 1, 1);
392     // https:// (SSL)
393     if (strpos( $options, 'S') === false) $aOptions['S'] = '-';
394     else $aOptions['S'] = substr( $options, strpos( $options, 'S') + 1, 1);
395     // mailto: (email)
396     if (strpos( $options, 'E') === false) $aOptions['E'] = '?';
397     else $aOptions['E'] = substr( $options, strpos( $options, 'E') + 1, 1);
398     // ftp://
399     if (strpos( $options, 'F') === false) $aOptions['F'] = '-';
400     else $aOptions['F'] = substr( $options, strpos( $options, 'F') + 1, 1);
401     // User section
402     if (strpos( $options, 'u') === false) $aOptions['u'] = '+';
403     else $aOptions['u'] = substr( $options, strpos( $options, 'u') + 1, 1);
404     // Password in user section
405     if (strpos( $options, 'P') === false) $aOptions['P'] = '-';
406     else $aOptions['P'] = substr( $options, strpos( $options, 'P') + 1, 1);
407     // Address Section
408     if (strpos( $options, 'a') === false) $aOptions['a'] = '+';
409     else $aOptions['a'] = substr( $options, strpos( $options, 'a') + 1, 1);
410     // IP Address in address section
411     if (strpos( $options, 'I') === false) $aOptions['I'] = '-';
412     else $aOptions['I'] = substr( $options, strpos( $options, 'I') + 1, 1);
413     // Port number
414     if (strpos( $options, 'p') === false) $aOptions['p'] = '-';
415     else $aOptions['p'] = substr( $options, strpos( $options, 'p') + 1, 1);
416     // File Path
417     if (strpos( $options, 'f') === false) $aOptions['f'] = '-';
418     else $aOptions['f'] = substr( $options, strpos( $options, 'f') + 1, 1);
419     // Query Section
420     if (strpos( $options, 'q') === false) $aOptions['q'] = '-';
421     else $aOptions['q'] = substr( $options, strpos( $options, 'q') + 1, 1);
422     // Fragment (Anchor)
423     if (strpos( $options, 'r') === false) $aOptions['r'] = '-';
424     else $aOptions['r'] = substr( $options, strpos( $options, 'r') + 1, 1);
425
426     // Generate options
427     $newoptions = '';
428     foreach($aOptions as $key => $value)
429     {
430         $newoptions .= $key . $value;
431     }
432
433     // DEBUGGING - Uncomment line below to display generated options
434     // echo '<pre>' . $newoptions . '</pre>';
435
436     // Send to validateUrlSyntax() and return result
437     return validateUrlSyntax( $emailaddr, $newoptions);
438
439 } // END Function validateEmailSyntax()
440
441
442
443 /*
444 About ValidateFtpSyntax():
445     This function uses the ValidateUrlSyntax() function to easily check the
446     syntax of an FTP address. It accepts the same options as ValidateURLSyntax
447     but defaults them for FTP addresses.
448
449
450 Usage:
451     validateFtpSyntax( url_to_check[, options])
452
453     url_to_check - string - The url to check
454     
455     options - string - A optional string of options to set which parts of
456             the url are required, optional, or not allowed. Each option
457             must be followed by a "+" for required, "?" for optional, or
458             "-" for not allowed. See ValidateUrlSyntax() docs for option list.
459
460     The default options are changed to:
461         s?H-S-E-F+u?P?a+I?p?f?q-r-
462
463 Examples:
464     validateFtpSyntax('ftp://netscape.com')
465     validateFtpSyntax('moz:iesucks@netscape.com')
466     validateFtpSyntax('ftp://netscape.com:2121/browsers/ns7/', 'u-')
467
468
469 Author(s):
470     Rod Apeldoorn - rod(at)canowhoopass(dot)com
471
472
473 Homepage:
474     http://www.canowhoopass.com/
475
476
477 License:
478     Copyright 2004 - Rod Apeldoorn
479
480     Released under same license as validateUrlSyntax(). For details, contact me.
481 */
482
483 function validateFtpSyntax( $ftpaddr, $options="" ){
484
485     // Check Options Parameter
486     if (!ereg( '^([sHSEFuPaIpfqr][+?-])*$', $options ))
487     {
488         trigger_error("Options attribute malformed", E_USER_ERROR);
489     }
490
491     // Set Options Array, set defaults if options are not specified
492     // Scheme
493     if (strpos( $options, 's') === false) $aOptions['s'] = '?';
494     else $aOptions['s'] = substr( $options, strpos( $options, 's') + 1, 1);
495     // http://
496     if (strpos( $options, 'H') === false) $aOptions['H'] = '-';
497     else $aOptions['H'] = substr( $options, strpos( $options, 'H') + 1, 1);
498     // https:// (SSL)
499     if (strpos( $options, 'S') === false) $aOptions['S'] = '-';
500     else $aOptions['S'] = substr( $options, strpos( $options, 'S') + 1, 1);
501     // mailto: (email)
502     if (strpos( $options, 'E') === false) $aOptions['E'] = '-';
503     else $aOptions['E'] = substr( $options, strpos( $options, 'E') + 1, 1);