The documentation is incomplete. The Vae Soli! team works hard to make it as exhaustive as possible with tons of useful examples and easy to understand explanations.
On top of that, we have decided to use our own tool to generate the documentation you read. This also takes time to fine-tune. Our goal is have better documentation, easier to read, easier to use, completely integrated with our site.
Even though NOT finalized, you can now link back to our documentation pages as we have decided once and for all about the structure of our documents and where they all reside on our server.
Thank you very much
The Vae Soli! team.
Generated by The Vae Soli! Documentor: Guide v. 1.3.0018 on 01-04-2015 16:19:49 (DD-MM-YYYY HH:mm:SS). This documentation is built with Vae Soli! functions and classes!
The download page of Vae Soli! contains all sources of the framework.
Additional samples are available on the samples page of Vae Soli!.
Assertions count: 0
Assertions successful: 0
Assertion failures: 0
0001 ... <?php 0002 ... /**************************************************************************/ 0003 ... /** {{{*fheader 0004 ... {*file LSHtml.functions.php *} 0005 ... {*purpose HTML parsing functions *} 0006 ... {*author Pat Y. Boens *} 0007 ... {*company [br]Lato Sensu Management[br] 0008 ... Rue Bois des Mazuis, 47[br] 0009 ... 5070 Vitrival[br] 0010 ... Belgium (BE)[br] 0011 ... [url]http://www.latosensu.be[/url][br] 0012 ... Vae Soli! : [url]http://www.vaesoli.org[/url] 0013 ... *} 0014 ... {*cdate 06/06/2012 - 11:18 *} 0015 ... {*mdate auto *} 0016 ... {*license [url]http://creativecommons.org/licenses/by-sa/2.0/be/[/url][br] 0017 ... 0018 ... To obtain detailed information about the license 0019 ... terms, please head to the full license text 0020 ... available in the [file]LSCopyright.php[/file] file *} 0021 ... 0022 ... ------------------------------------------------------------------------ 0023 ... Changes History: 0024 ... ------------------------------------------------------------------------ 0025 ... 0026 ... {*chist 0027 ... {*mdate 21/06/2012 *} 0028 ... {*v 5.0.0003 *} 0029 ... {*desc All comments before this release have been 0030 ... eliminated (for tracking purposes, please 0031 ... head to the Vae Soli! archive : vaesoli-5.0.0002.zip *} 0032 ... *} 0033 ... 0034 ... {*chist 0035 ... {*mdate 20/09/2012 *} 0036 ... {*v 5.6.0000 *} 0037 ... {*desc 1) Comments à la guide 0038 ... 2) Handling microformats in automated doc (guide) 0039 ... *} 0040 ... 0041 ... *}}} */ 0042 ... /**************************************************************************/ 0043 ... if ( ! defined( 'VAESOLI_PATH' ) ) /* If the path is not defined yet */ 0044 ... { 0045 ... /* {*define (VAESOLI_PATH) Define the path where Vae Soli! is installed *} */ 0046 ... define( 'VAESOLI_PATH',__DIR__ ); 0047 ... } /* if ( ! defined( 'VAESOLI_PATH' ) ) */ 0048 ... 0049 ... if ( ! defined( 'VAESOLI_PLUGINS' ) ) /* If the path is not defined yet */ 0050 ... { 0051 ... /* {*define (VAESOLI_PLUGINS) Define the path where plugins are located *} */ 0052 ... define( 'VAESOLI_PLUGINS',VAESOLI_PATH . '/../plugins' ); 0053 ... } /* if ( ! defined( 'VAESOLI_PLUGINS' ) ) */ 0054 ... 0055 ... 0056 ... /* ========================================================================== */ 0057 ... /** {{*HTML_GetScripts( $szHTML )= 0058 ... 0059 ... Extracts all <script>...</script> tags 0060 ... 0061 ... {*params 0062 ... $szHTML (string) Input string to parse (typically some HTML code) 0063 ... *} 0064 ... 0065 ... {*return 0066 ... (array) An array of scripts. HTML_GetScripts() does not return the 0067 ... empty scripts. 0068 ... *} 0069 ... 0070 ... {*example 0071 ... $this->aScripts = HTML_GetScripts( $this->szHTML ); 0072 ... foreach( $this->aScripts as $szScript ) 0073 ... { 0074 ... echo '<h3>New inline script</h3>'; 0075 ... echo '<pre>'; 0076 ... echo htmlentities( $szScript ); 0077 ... echo '</pre>'; 0078 ... echo '<hr />'; 0079 ... } 0080 ... *} 0081 ... 0082 ... {*author Pat Y. Boens *} 0083 ... 0084 ... *}} 0085 ... */ 0086 ... /* ========================================================================== */ 0087 ... function HTML_GetScripts( $szHTML ) 0088 ... /*-------------------------------*/ 0089 ... { 0090 ... $aScripts = array(); /* Array of scripts */ 0091 ... 0092 ... // Attention ... je ne traite pas correctement le case de src="" 0093 ... 0094 ... if ( ! STR_Empty( $szHTML ) ) /* If HTML mentioned */ 0095 ... { 0096 ... //if ( preg_match_all( '%<script[^>]*>(.*?)</script>%si', 0097 ... if ( preg_match_all( '/<script[^>]*?>(.*?)<\/script>/si', 0098 ... $szHTML,$aMatch,PREG_PATTERN_ORDER ) ) 0099 ... { 0100 ... //var_dump( $aMatch ); 0101 ... foreach ( $aMatch[1] as $szEcma ) 0102 ... { 0103 ... $szEcma = trim( $szEcma ); 0104 ... 0105 ... if ( ! STR_Empty( $szEcma ) ) 0106 ... { 0107 ... $aScripts[] = $szEcma; 0108 ... } 0109 ... } 0110 ... } 0111 ... } 0112 ... return ( $aScripts ); /* Return result to caller */ 0113 ... } /* End of HTML_GetScripts() ============================================= */ 0114 ... 0115 ... /* ====================================================================== */ 0116 ... /** {{*HTML_GetMetas( $szHTML )= 0117 ... 0118 ... Extracts all meta tag content attributes from a string and returns an array 0119 ... 0120 ... {*params 0121 ... $szHTML (string) Input string to parse (typically some HTML code) 0122 ... *} 0123 ... 0124 ... {*caution 0125 ... Only <meta name="..." content="" /> and <meta http-equiv="..." content="" /> 0126 ... metas are returned. 0127 ... *} 0128 ... 0129 ... {*return 0130 ... (array) The value of the name/http-equiv property becomes the key; the 0131 ... value of the content property becomes the value. 0132 ... *} 0133 ... 0134 ... {*example 0135 ... $aTags = HTML_GetMetas( $szHTML ); 0136 ... foreach( $aTags as $szKey => $szValue ) 0137 ... { 0138 ... echo "<p>{$szKey} = {$szValue}</p>\n"; 0139 ... } 0140 ... *} 0141 ... 0142 ... {*author Pat Y. Boens *} 0143 ... 0144 ... *}} 0145 ... */ 0146 ... /* ====================================================================== */ 0147 ... function HTML_GetMetas( $szHTML ) 0148 ... /*-----------------------------*/ 0149 ... { 0150 ... $aTags = array(); /* Array of meta tags: Return value of the function */ 0151 ... 0152 ... if ( ! STR_Empty( $szHTML ) ) /* If HTML mentioned */ 0153 ... { 0154 ... /* First treat <meta name="" ... /> */ 0155 ... if ( preg_match_all( '%<meta +?name="([[:alnum:]\._-]+)".*?content="(.*?)"[ /]*?>%si', 0156 ... $szHTML,$aMatch,PREG_PATTERN_ORDER ) ) 0157 ... { 0158 ... $iMatches = count( $aMatch[0] ); /* Number of metas detected */ 0159 ... 0160 ... for( $i = 0;$i < $iMatches;$i++ ) /* For each meta */ 0161 ... { 0162 ... $szMeta = trim( $aMatch[1][$i] ); /* Meta name */ 0163 ... $szValue = trim( $aMatch[2][$i] ); /* Meta value */ 0164 ... $aTags[$szMeta] = $szValue; /* Add to the list of metas */ 0165 ... } /* for( $i = 0;$i < $iMatches;$i++ ) */ 0166 ... } /* if ( preg_match_all( '%<meta +?name=" */ 0167 ... 0168 ... /* Then treat <meta http-equiv="" ... /> */ 0169 ... if ( preg_match_all( '%<meta +?http-equiv="([[:alnum:]\._-]+)" *?content="(.*?)"[ /]*?>%si', 0170 ... $szHTML,$aMatch,PREG_PATTERN_ORDER ) ) 0171 ... { 0172 ... $iMatches = count( $aMatch[0] ); /* Number of metas detected */ 0173 ... 0174 ... for( $i = 0;$i < $iMatches;$i++ ) /* For each meta */ 0175 ... { 0176 ... $szMeta = trim( $aMatch[1][$i] ); /* Meta http-equiv */ 0177 ... $szValue = trim( $aMatch[2][$i] ); /* Meta value */ 0178 ... $aTags[$szMeta] = $szValue; /* Add to the list of metas */ 0179 ... } /* for( $i = 0;$i < $iMatches;$i++ ) */ 0180 ... } /* if ( preg_match_all( '%<meta +?http-equiv=" ... */ 0181 ... } /* if ( ! STR_Empty( $szHTML ) ) */ 0182 ... 0183 ... return ( $aTags ); /* Return result to caller */ 0184 ... } /* End of function HTML_GetMetas() ====================================== */ 0185 ... 0186 ... /* ========================================================================== */ 0187 ... /** {{*HTML_IsUTF8( $szStr )= 0188 ... 0189 ... Determines whether the text of a page is UTF-8 encoded 0190 ... 0191 ... {*params 0192 ... $szStr (string) Input string (typically the HTML code of a web page) 0193 ... *} 0194 ... 0195 ... {*return 0196 ... (bool) [c]true[/c] if $szStr is UTF-8 encoded; [c]false[/c] if not 0197 ... *} 0198 ... 0199 ... {*example 0200 ... if ( HTML_IsUTF8( $szText = HTTP_GetURL( $szURL ) ) ) 0201 ... { 0202 ... $szText = utf8_decode( $szText ); 0203 ... } 0204 ... *} 0205 ... 0206 ... {*author Pat Y. Boens *} 0207 ... 0208 ... {*note {title:algorithm} 0209 ... [p]The HTML_IsUTF8() function tries to determine if an HTML code is UTF8 0210 ... encoded by examining (in sequence):[/p] 0211 ... [ol] 0212 ... [li]the [c]http-equiv="content-type"[/c][/li] 0213 ... [li]the [c]meta="charset"[/c][/li] 0214 ... [li]run the [c]mb_detect_encoding()[/c] function on the text ($szStr)[/li] 0215 ... [/ol] 0216 ... *} 0217 ... 0218 ... *}} 0219 ... */ 0220 ... /* ========================================================================== */ 0221 ... function HTML_IsUTF8( $szStr ) 0222 ... /*--------------------------*/ 0223 ... { 0224 ... $bRetVal = false; /* Return value of the function */ 0225 ... $szCharset = 'ISO-8859-15'; /* Default charset */ 0226 ... 0227 ... /* Let's take a look at the meta first */ 0228 ... if ( preg_match( '/<meta +http-equiv="content-type" +content="(.+?)" *\/>/si',$szStr,$aMatch ) ) 0229 ... { 0230 ... $szCharset = $aMatch[1]; 0231 ... //echo "MATCH 1"; 0232 ... } 0233 ... elseif ( preg_match( '/<meta +charset=(["\'])(.*?)\1/si',$szStr,$aMatch ) ) 0234 ... { 0235 ... $szCharset = $aMatch[2]; 0236 ... //echo "MATCH 2"; 0237 ... } 0238 ... 0239 ... /* Let's first try by ourselves */ 0240 ... if ( $bRetVal = ( STR_iPos( $szCharset,'UTF-8' ) != -1 ) ) /* If UTF-8 found */ 0241 ... { 0242 ... //echo "<p>",__METHOD__,"() at line ",__LINE__,": UTF-8 detected by myself</p>"; 0243 ... } /* if ( STR_iPos( $szCharset,'UTF-8' ) != -1 ) */ 0244 ... else /* Else of ... if ( STR_iPos( $szCharset,'UTF-8' ) != -1 ) */ 0245 ... { 0246 ... if ( function_exists( 'mb_detect_encoding' ) ) /* Rely on PHP ... if mb extension installed */ 0247 ... { 0248 ... //if ( ! is_bool( $xEncoding = mb_detect_encoding( $szStr ) ) ) 0249 ... if ( ! is_bool( $xEncoding = mb_detect_encoding( $szStr,'auto',true ) ) ) 0250 ... { 0251 ... if ( ( $bRetVal = ( $xEncoding === 'UTF-8' ) ) ) 0252 ... { 0253 ... //echo "<p>",__METHOD__,"() at line ",__LINE__,": UTF-8 detected with mb_detect_encoding()</p>"; 0254 ... } 0255 ... } 0256 ... } /* if ( function_exists( 'mb_detect_encoding' ) ) */ 0257 ... } /* End of ... Else of ... if ( STR_iPos( $szCharset,'UTF-8' ) != -1 ) */ 0258 ... 0259 ... return ( $bRetVal ); /* Return value of the function */ 0260 ... } /* End of HTML_IsUTF8() ================================================= */ 0261 ... 0262 ... /* ========================================================================== */ 0263 ... /** {{*HTML_Cleanup( $szStr,$aParams )= 0264 ... 0265 ... HTML cleanup according to an array of parameters 0266 ... 0267 ... {*params 0268 ... $szStr (string) Input string to parse (typically some HTML code) 0269 ... $aParams (array) Array of parameters (actions to perform)[br][br] 0270 ... $aParams['gtlt' ] ... turn [c]'><'[/c] to [c]'> <'[/c][br] 0271 ... $aParams['script' ] ... remove scripts[br] 0272 ... $aParams['style' ] ... remove styles[br] 0273 ... $aParams['tags' ] ... remove tags and HTML comments[br] 0274 ... $aParams['doctype' ] ... remove DOCTYPE[br] 0275 ... $aParams['marks' ] ... remove some weird marks[br] 0276 ... $aParams['punct' ] ... remove punctuation[br] 0277 ... $aParams['braces' ] ... transform [c]'{([ ...'[/c] to a space[br] 0278 ... $aParams['spaces' ] ... turn multiple space instances to a single instance[br] 0279 ... *} 0280 ... 0281 ... {*caution 0282 ... Prototype ... do not base your code on this function yet 0283 ... *} 0284 ... 0285 ... {*return 0286 ... (string) $szStr which has been treated according to the parameters 0287 ... found in $aParams 0288 ... *} 0289 ... 0290 ... {*example 0291 ... // Imagine $szHTML to be an entire HTML file 0292 ... $aParams = array(); 0293 ... 0294 ... $aParams['gtlt' ] = true; 0295 ... $aParams['script' ] = true; 0296 ... $aParams['punct' ] = true; 0297 ... $aParams['braces' ] = true; 0298 ... $aParams['spaces' ] = true; 0299 ... 0300 ... $szStr = HTML_CleanUp( $szHTML,$aParams ); 0301 ... *} 0302 ... *}} 0303 ... */ 0304 ... /* ========================================================================== */ 0305 ... function HTML_CleanUp( $szStr,$aParams ) 0306 ... /*------------------------------------*/ 0307 ... { 0308 ... if ( isset( $aParams['gtlt'] ) ) /* Treat '><' to turn them to '> <' */ 0309 ... { 0310 ... $szStr = str_replace( '><','> <',$szStr ); 0311 ... } 0312 ... 0313 ... if ( isset( $aParams['script'] ) ) /* Eliminate scripts */ 0314 ... { 0315 ... $szStr = preg_replace( '%<script[^>]*>(.*?)</script>%si','',$szStr ); 0316 ... } 0317 ... 0318 ... if ( isset( $aParams['style'] ) ) /* Eliminate styles */ 0319 ... { 0320 ... $szStr = preg_replace('%<style[^>]*>(.*?)</style>%si','',$szStr ); 0321 ... } 0322 ... 0323 ... if ( isset( $aParams['tags'] ) ) /* Eliminate tags */ 0324 ... { 0325 ... /* Removes all HTML tags, HTML comments, and script and style tags along with their contents. */ 0326 ... $szStr = preg_replace( '%</?[a-z][a-z0-9]*[^<>]*>|<!--.*?-->%si',' ',$szStr ); 0327 ... } 0328 ... 0329 ... if ( isset( $aParams['doctype'] ) ) /* Eliminate DOCTYPE */ 0330 ... { 0331 ... $szStr = preg_replace( '/<!DOCTYPE.*?>/si','',$szStr ); 0332 ... } 0333 ... 0334 ... if ( isset( $aParams['marks'] ) ) 0335 ... { 0336 ... $szStr = str_replace( array( "\t" , 0337 ... "\r\n" , 0338 ... "\n" , 0339 ... ' ' , 0340 ... '"' , 0341 ... '€' , 0342 ... '–' , 0343 ... '©' , 0344 ... '°' , 0345 ... '«' , 0346 ... '»' , 0347 ... '€' , 0348 ... '–' , 0349 ... '—' , 0350 ... '―' , 0351 ... ' ' , 0352 ... '…' ),' ',$szStr ); 0353 ... } 0354 ... 0355 ... if ( isset( $aParams['punct'] ) ) 0356 ... { 0357 ... //$szStr = str_replace( array('?' ,'æ' ,0x93,0x94,'«','»'), 0358 ... // array('oe','ae','"' ,'"' ,'"','"'), 0359 ... // $szStr ); 0360 ... //$szStr = str_replace( array('?' ,'æ' ), 0361 ... // array('oe','ae'), 0362 ... // $szStr ); 0363 ... $szStr = preg_replace( '/[[:punct:]]/si',' ',$szStr ); 0364 ... //$szStr = preg_replace('/[\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2A\x2B\x2C\x2D\x2E\x2F\x3A\x3B\x3C\x3D\x3E\x3F\x40\x5B\x5C\x5D\x5E\x5F\x60\x7B\x7C\x7D\x7E\x80\x82\x84\x85\x88\x8B\x91\x92\x93\x94\x96\x97\x98\x9B\xA6\xA8\xAB\xAF\xB1\xB4\xB7\xB8\xBB\xD7\xF7]/si',' ',$szStr ); 0365 ... //$szStr = preg_replace('/[\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2A\x2B\x2C\x2D\x2E\x2F\x3A\x3B\x3C\x3D\x3E\x3F\x40\x5B\x5C\x5D\x5E\x5F\x60\x7B\x7C\x7D\x7E\x82\x82\x84\x85\x88\x8B\x91\x92\x96\x97\x98\x9B\xA6]/si',' ',$szStr ); 0366 ... } /* if ( isset( $aParams['punct'] ) ) */ 0367 ... 0368 ... if ( isset( $aParams['braces'] ) ) 0369 ... { 0370 ... /* Replaces all individual characters that we don't want (/(){}'"...) by a space */ 0371 ... $szStr = preg_replace( '/[\x21\x23\x27\x28\x29\x2A\x2B\x2C\x2D\x2F\x3A\x3B' . 0372 ... '\x3C\x3D\x3E\x3F\x5B\x5D\x5E\x5F\x60\x7B\x7C\x7D' . 0373 ... '\x7E\x7F\x82\x84\x85\x86\x87\x88\x8B\x91\x92\x93' . 0374 ... '\x94\x95\x96\x97\x98\x9B\xA1\xA6\xA8\xAB\xAC\xAD' . 0375 ... '\xAF\xB1\xB4\xB8\xBB\xBF\xF7]/si',' ',$szStr ); 0376 ... //$szStr = str_replace( array( '(',')','{','}' ), 0377 ... // '', 0378 ... // $szStr ); 0379 ... } 0380 ... 0381 ... if ( isset( $aParams['spaces'] ) ) 0382 ... { 0383 ... /* Replaces multiple occurrences of ' ' with a single space */ 0384 ... //$szStr = str_replace( array( ' ',' ',' ') , 0385 ... // ' ' , 0386 ... // preg_replace( '/\s{2,}/',' ',$szStr ) ); 0387 ... $szStr = str_replace( array( ' ',' ',' ') , 0388 ... ' ' , 0389 ... $szStr ); 0390 ... $szStr = str_replace( array( ' ',' ',' ') , 0391 ... ' ' , 0392 ... $szStr ); 0393 ... } /* if ( isset( $aParams['spaces'] ) ) */ 0394 ... 0395 ... return ( $szStr ); 0396 ... } /* End of HTML_CleanUp() ================================================ */ 0397 ... 0398 ... /* ========================================================================== */ 0399 ... /** {{*HTML_GetTitle( $szHTML )= 0400 ... 0401 ... Extracts the title of a HTML page 0402 ... 0403 ... {*params 0404 ... $szHTML (string) Input string to parse (typically some HTML code) 0405 ... *} 0406 ... 0407 ... {*return 0408 ... (string) The title of the page or null if the title is NOT found 0409 ... *} 0410 ... 0411 ... {*example 0412 ... if ( ! is_null( $szTitle = HTML_GetTitle( $szHTML ) ) ) 0413 ... { 0414 ... echo "<p>Title = {$szTitle}</p>\n"; 0415 ... } 0416 ... *} 0417 ... *}} 0418 ... */ 0419 ... /* ========================================================================== */ 0420 ... function HTML_GetTitle( $szHTML ) 0421 ... /*-----------------------------*/ 0422 ... { 0423 ... $szRetVal = null; /* Return value of the method */ 0424 ... 0425 ... if ( ! STR_Empty( $szHTML ) ) /* If HTML mentioned */ 0426 ... { 0427 ... /* Pattern matching */ 0428 ... if ( preg_match( '%<title[^>]*>(.*?)</title>%si',$szHTML,$aMatch ) ) 0429 ... { 0430 ... $szRetVal = $aMatch[1]; /* Title of the page */ 0431 ... } /* if ( preg_match( '/(<title ... */ 0432 ... } /* if ( ! STR_Empty( $szHTML ) ) */ 0433 ... 0434 ... return ( $szRetVal ); /* Return result to caller */ 0435 ... } 0436 ... 0437 ... function HTML_ParseAnchor( $szAnchor ) 0438 ... /*----------------------------------*/ 0439 ... { 0440 ... $aParts = array(); 0441 ... 0442 ... $aParts['rel'] = 0443 ... $aParts['accesskey'] = 0444 ... $aParts['title'] = 0445 ... $aParts['href'] = null; 0446 ... 0447 ... if ( ! STR_Empty( $szAnchor ) ) 0448 ... { 0449 ... if ( preg_match( '%<a[^>]*href=(["\'])(.*?)\1[^>]*>.*?</a>%si',$szAnchor ) ) 0450 ... { 0451 ... if ( preg_match( '/href=(["\'])(.*?)\1/si',$szAnchor,$aMatch ) ) 0452 ... { 0453 ... $aParts['href'] = $aMatch[2]; 0454 ... } 0455 ... 0456 ... if ( preg_match( '/title=(["\'])(.*?)\1/si',$szAnchor,$aMatch ) ) 0457 ... { 0458 ... $aParts['title'] = $aMatch[2]; 0459 ... } 0460 ... 0461 ... if ( preg_match( '/rel=(["\'])(.*?)\1/si',$szAnchor,$aMatch ) ) 0462 ... { 0463 ... $aParts['rel'] = $aMatch[2]; 0464 ... } 0465 ... 0466 ... if ( preg_match( '/accesskey=(["\'])(.*?)\1/si',$szAnchor,$aMatch ) ) 0467 ... { 0468 ... $aParts['accesskey'] = $aMatch[2]; 0469 ... } 0470 ... } 0471 ... } 0472 ... 0473 ... return ( $aParts ); 0474 ... } 0475 ... 0476 ... /* ========================================================================== */ 0477 ... /** {{*HTML_GetAnchors( $szHTML,$bFull )= 0478 ... 0479 ... Extracts the links ([c]<a>...</a>[/c]) of an HTML string 0480 ... 0481 ... {*params 0482 ... $szHTML (string) Input string to parse (typically some HTML code) 0483 ... $bFull (bool) Must we return the href attribute only 0484 ... ([c]false[/c]) or the full anchor ([c]true[/c]). 0485 ... Optional. [c]false[/c] by default. 0486 ... *} 0487 ... 0488 ... {*return 0489 ... (array) An array of links (href if $bFull is [c]false[/c]; 0490 ... full anchor if $bFull is [c]true[/c]) 0491 ... *} 0492 ... 0493 ... {*example 0494 ... $aLinks = HTML_GetAnchors( $szHTML ); 0495 ... var_dump( $aLinks ); 0496 ... // 0 => string '/services-core.php' (length=18) 0497 ... // 1 => string '/services-core.php' (length=18) 0498 ... // 2 => string '/web-expertise.php' (length=18) 0499 ... // 3 => string '/legal.php' (length=10) 0500 ... // 4 => string 'http://www.vaesoli.org' (length=22) 0501 ... // 5 => string '/services-core.php' (length=18) 0502 ... // 6 => string '/portfolio-core.php' (length=19) 0503 ... // 7 => string '/articles-core.php' (length=18) 0504 ... // ... 0505 ... *} 0506 ... *}} 0507 ... */ 0508 ... /* ====================================================================== */ 0509 ... function HTML_GetAnchors( $szHTML,$bFull = false ) 0510 ... /*----------------------------------------------*/ 0511 ... { 0512 ... $aLinks = array(); /* Array of links */ 0513 ... 0514 ... if ( ! STR_Empty( $szHTML ) ) /* If no HTML mentioned */ 0515 ... { 0516 ... if ( preg_match_all( '%<a[^>]*href=(["\'])(.*?)\1[^>]*>.*?</a>%si',$szHTML,$aMatch,PREG_PATTERN_ORDER ) ) 0517 ... { 0518 ... if ( $bFull ) /* If must return entire matches */ 0519 ... $aLinks = $aMatch[0]; /* array of ... <a href="/creation-site-web.php" title="..." accesskey="a" ... >Mission</a> */ 0520 ... else 0521 ... $aLinks = $aMatch[2]; /* array of href ... '/creation-site-web.php' */ 0522 ... //var_dump( $aLinks ); 0523 ... } /* if ( preg_match_all( '%<a[^>]*href=[\'"](.*?)[\'"][^>]*>.*?</a>%si' ... */ 0524 ... } /* if ( ! STR_Empty( $szHTML ) ) */ 0525 ... 0526 ... return ( $aLinks ); /* Return result to caller */ 0527 ... } /* End of HTML_GetAnchors =============================================== */
VAESOLI_PATH
: Define the path where Vae Soli! is installed
VAESOLI_PLUGINS
: Define the path where plugins are located
...
) of an HTML string HTML_Cleanup()
: HTML cleanup according to an array of parametersPrototype ... do not base your code on this function yet
HTML_Cleanup( $szStr,$aParams )
Name | Type | Description |
---|---|---|
$szStr |
string | Input string to parse (typically some HTML code) |
$aParams |
array | Array of parameters (actions to perform)$aParams ['gtlt' ] ... turn '><' to '> <' $aParams ['script' ] ... remove scripts$aParams ['style' ] ... remove styles$aParams ['tags' ] ... remove tags and HTML comments$aParams ['doctype' ] ... remove DOCTYPE$aParams ['marks' ] ... remove some weird marks$aParams ['punct' ] ... remove punctuation$aParams ['braces' ] ... transform '{([ ...' to a space$aParams ['spaces' ] ... turn multiple space instances to a single instance |
(string) $szStr
which has been treated according to the parameters
found in $aParams
// Imagine $szHTML to be an entire HTML file $aParams = array(); $aParams['gtlt' ] = true; $aParams['script' ] = true; $aParams['punct' ] = true; $aParams['braces' ] = true; $aParams['spaces' ] = true; $szStr = HTML_CleanUp( $szHTML,$aParams );
WARNING: No Unit Testing found. Please provide assertions with assertion constructs ({*assert ... *}
) or with GuideAssert()
function calls in exec constructs ({*exec LSUnitTesting::assert(...); *}
).
HTML_GetAnchors()
: Extracts the links (...
) of an HTML stringHTML_GetAnchors( $szHTML,$bFull )
Name | Type | Description |
---|---|---|
$szHTML |
string | Input string to parse (typically some HTML code) |
$bFull |
bool | Must we return the href attribute only
(false ) or the full anchor (true ).
Optional. false by default. |
(array) An array of links (href if $bFull
is false
;
full anchor if $bFull
is true
)
$aLinks = HTML_GetAnchors( $szHTML ); var_dump( $aLinks ); // 0 => string '/services-core.php' (length=18) // 1 => string '/services-core.php' (length=18) // 2 => string '/web-expertise.php' (length=18) // 3 => string '/legal.php' (length=10) // 4 => string 'http://www.vaesoli.org' (length=22) // 5 => string '/services-core.php' (length=18) // 6 => string '/portfolio-core.php' (length=19) // 7 => string '/articles-core.php' (length=18) // ...
WARNING: No Unit Testing found. Please provide assertions with assertion constructs ({*assert ... *}
) or with GuideAssert()
function calls in exec constructs ({*exec LSUnitTesting::assert(...); *}
).
HTML_GetMetas()
: Extracts all meta tag content attributes from a string and returns an arrayOnly and metas are returned.
HTML_GetMetas( $szHTML )
Name | Type | Description |
---|---|---|
$szHTML |
string | Input string to parse (typically some HTML code) |
(array) The value of the name/http-equiv property becomes the key; the value of the content property becomes the value.
$aTags = HTML_GetMetas( $szHTML ); foreach( $aTags as $szKey => $szValue ) { echo "<p>{$szKey} = {$szValue}</p>\n"; }
WARNING: No Unit Testing found. Please provide assertions with assertion constructs ({*assert ... *}
) or with GuideAssert()
function calls in exec constructs ({*exec LSUnitTesting::assert(...); *}
).
HTML_GetScripts()
: Extracts all tagsHTML_GetScripts( $szHTML )
Name | Type | Description |
---|---|---|
$szHTML |
string | Input string to parse (typically some HTML code) |
(array) An array of scripts. HTML_GetScripts() does not return the empty scripts.
$this->aScripts = HTML_GetScripts( $this->szHTML ); foreach( $this->aScripts as $szScript ) { echo '<h3>New inline script</h3>'; echo '<pre>'; echo htmlentities( $szScript ); echo '</pre>'; echo '<hr />'; }
WARNING: No Unit Testing found. Please provide assertions with assertion constructs ({*assert ... *}
) or with GuideAssert()
function calls in exec constructs ({*exec LSUnitTesting::assert(...); *}
).
HTML_GetTitle()
: Extracts the title of a HTML pageHTML_GetTitle( $szHTML )
Name | Type | Description |
---|---|---|
$szHTML |
string | Input string to parse (typically some HTML code) |
(string) The title of the page or null if the title is NOT found
if ( ! is_null( $szTitle = HTML_GetTitle( $szHTML ) ) ) { echo "<p>Title = {$szTitle}</p>\n"; }
WARNING: No Unit Testing found. Please provide assertions with assertion constructs ({*assert ... *}
) or with GuideAssert()
function calls in exec constructs ({*exec LSUnitTesting::assert(...); *}
).
HTML_IsUTF8()
: Determines whether the text of a page is UTF-8 encodedThe HTML_IsUTF8() function tries to determine if an HTML code is UTF8 encoded by examining (in sequence):
http-equiv="content-type"
meta="charset"
mb_detect_encoding()
function on the text ($szStr
)HTML_IsUTF8( $szStr )
Name | Type | Description |
---|---|---|
$szStr |
string | Input string (typically the HTML code of a web page) |
(bool) true
if $szStr
is UTF-8 encoded; false
if not
if ( HTML_IsUTF8( $szText = HTTP_GetURL( $szURL ) ) ) { $szText = utf8_decode( $szText ); }
WARNING: No Unit Testing found. Please provide assertions with assertion constructs ({*assert ... *}
) or with GuideAssert()
function calls in exec constructs ({*exec LSUnitTesting::assert(...); *}
).