I've been working on a little project, and I find myself in a position where I need a php function which can linkify URLs in my data, while enabling me to set some exceptions on links I don't want to linkify. Any idea of how to do this?
- Are links easily identifiable? Properly formed "example.com/someuri"? Or just some random-ish text that could be a hostname and possibly a directory or query?Marc B– Marc B2011-02-22 16:23:27 +00:00Commented Feb 22, 2011 at 16:23
- I need to be able to filter out youtube links - so it might be youtube.com/watch?v=cqhsunMOWKQ&feature=feeduHirvesh– Hirvesh2011-02-22 16:25:51 +00:00Commented Feb 22, 2011 at 16:25
- and many other links of different media sites like flickr, vimeo, etcHirvesh– Hirvesh2011-02-22 16:26:26 +00:00Commented Feb 22, 2011 at 16:26
- Are you sure you wanna do that in PHP? It's relatively easy to do this in javascript using JQuery.Stofke– Stofke2011-02-22 17:02:13 +00:00Commented Feb 22, 2011 at 17:02
- yep, I need it in php, because I need the data to work on. Isn't it possible to create a function which passes an array of top level domains to filter out and just linkify the rest? I"m stuck here.Hirvesh– Hirvesh2011-02-22 17:04:41 +00:00Commented Feb 22, 2011 at 17:04
3 Answers
I have an open source project on GitHub: LinkifyURL which you may want to consider. It has a function: linkify() which plucks URLs from text and converts them to links. Note that this is not a trivial task to do correctly! (See: The Problem With URLs - ands be sure to read the thread of comments to grasp all the things that can go wrong.)
If you really need to NOT linkify specific domains (i.e. vimeo and youtube), here is a modified PHP function linkify_filtered (in the form of a working test script) that does what you need:
<?php // test.php 20110313_1200 function linkify_filtered($text) { $url_pattern = '/# Rev:20100913_0900 github.com\/jmrware\/LinkifyURL # Match http & ftp URL that is not already linkified. # Alternative 1: URL delimited by (parentheses). (\() # $1 "(" start delimiter. ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $2: URL. (\)) # $3: ")" end delimiter. | # Alternative 2: URL delimited by [square brackets]. (\[) # $4: "[" start delimiter. ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $5: URL. (\]) # $6: "]" end delimiter. | # Alternative 3: URL delimited by {curly braces}. (\{) # $7: "{" start delimiter. ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $8: URL. (\}) # $9: "}" end delimiter. | # Alternative 4: URL delimited by <angle brackets>. (<|&(?:lt|\#60|\#x3c);) # $10: "<" start delimiter (or HTML entity). ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $11: URL. (>|&(?:gt|\#62|\#x3e);) # $12: ">" end delimiter (or HTML entity). | # Alternative 5: URL not delimited by (), [], {} or <>. ( # $13: Prefix proving URL not already linked. (?: ^ # Can be a beginning of line or string, or | [^=\s\'"\]] # a non-"=", non-quote, non-"]", followed by ) \s*[\'"]? # optional whitespace and optional quote; | [^=\s]\s+ # or... a non-equals sign followed by whitespace. ) # End $13. Non-prelinkified-proof prefix. ( \b # $14: Other non-delimited URL. (?:ht|f)tps?:\/\/ # Required literal http, https, ftp or ftps prefix. [a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]+ # All URI chars except "&" (normal*). (?: # Either on a "&" or at the end of URI. (?! # Allow a "&" char only if not start of an... &(?:gt|\#0*62|\#x0*3e); # HTML ">" entity, or | &(?:amp|apos|quot|\#0*3[49]|\#x0*2[27]); # a [&\'"] entity if [.!&\',:?;]? # followed by optional punctuation then (?:[^a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]|$) # a non-URI char or EOS. ) & # If neg-assertion true, match "&" (special). [a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]* # More non-& URI chars (normal*). )* # Unroll-the-loop (special normal*)*. [a-z0-9\-_~$()*+=\/#[\]@%] # Last char can\'t be [.!&\',;:?] ) # End $14. Other non-delimited URL. /imx'; // $url_replace = '$1$4$7$10$13<a href="$2$5$8$11$14">$2$5$8$11$14</a>$3$6$9$12'; // return preg_replace($url_pattern, $url_replace, $text); $url_replace = '_linkify_filter_callback'; return preg_replace_callback($url_pattern, $url_replace, $text); } function _linkify_filter_callback($m) { // Filter out youtube and vimeo domains. $pre = $m[1].$m[4].$m[7].$m[10].$m[13]; $url = $m[2].$m[5].$m[8].$m[11].$m[14]; $post = $m[3].$m[6].$m[9].$m[12]; if (preg_match('/\b(?:youtube|vimeo)\.com\b/', $url)) { return $pre . $url . $post; } // else linkify... return $pre .'<a href="'. $url .'">' . $url .'</a>' .$post; } // Create some test data. $data = 'Plain URLs (not delimited): foo http://example.com bar... foo http://example.com:80 bar... foo http://example.com:80/path/ bar... foo http://example.com:80/path/file.txt bar... foo http://example.com:80/path/file.txt?query=val&var2=val2 bar... foo http://example.com:80/path/file.txt?query=val&var2=val2#fragment bar... foo http://example.com/(file\'s_name.txt) bar... (with \' and (parentheses)) foo http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348] bar... ([IPv6 literal]) foo http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348]/file.txt bar... ([IPv6] with path) foo http://youtube.com bar... foo http://youtube.com:80 bar... foo http://youtube.com:80/path/ bar... foo http://youtube.com:80/path/file.txt bar... foo http://youtube.com:80/path/file.txt?query=val&var2=val2 bar... foo http://youtube.com:80/path/file.txt?query=val&var2=val2#fragment bar... foo http://youtube.com/(file\'s_name.txt) bar... (with \' and (parentheses)) foo http://vimeo.com bar... foo http://vimeo.com:80 bar... foo http://vimeo.com:80/path/ bar... foo http://vimeo.com:80/path/file.txt bar... foo http://vimeo.com:80/path/file.txt?query=val&var2=val2 bar... foo http://vimeo.com:80/path/file.txt?query=val&var2=val2#fragment bar... foo http://vimeo.com/(file\'s_name.txt) bar... (with \' and (parentheses)) '; // Verify it works... echo(linkify_filtered($data) ."\n"); ?> This employs a callback function to do the filtering. Yes, the regex is complex (but so it the problem as it turns out!). You can see the interactive Javascript version of linkify() in action here: URL Linkification (HTTP/FTP).
Also, John Gruber has a pretty good regex to do linkification. See: An Improved Liberal, Accurate Regex Pattern for Matching URLs. However, his regex suffers catastrophic backtracking under certain circumstances. (I've written to him about this, but he has yet to respond.)
Hope this helps! :)
3 Comments
$string = "some text and a link http://www.google.com" $new_string = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]","<a href=\"\\0\">\\0</a>", $string) or use: