get_meta_tags

(PHP 4, PHP 5, PHP 7, PHP 8)

get_meta_tags — Liest alle content-Attribute der Meta-Tags einer Datei aus und gibt ein Array zurück

Beschreibung

function get_meta_tags(string $filename, bool $use_include_path = false): array|false

Öffnet filename und untersucht die Datei Zeile für Zeile auf das Vorkommen von <meta>-Tags. Das Parsen wird bei </head> beendet.

Parameter-Liste

filename

Der Pfad zur HTML-Datei im Stringformat. Die Datei kann lokal oder als ein URL vorliegen.

Beispiel #1 Was get_meta_tags() parst

<meta name="author" content="name">
<meta name="keywords" content="php documentation">
<meta name="DESCRIPTION" content="a php manual">
<meta name="geo.position" content="49.33;-86.59">
</head> <!-- das Parsen endet hier -->

use_include_path

Ist use_include_path auf true gesetzt, versucht PHP die Datei unter Verwendung der Standard-Include-Pfade aus der INI-Direktive include_path zu finden. Diese Angabe wird nur für lokale Dateien, nicht jedoch für URLs verwendet.

Rückgabewerte

Gibt ein Array mit allen geparsten Meta-Tags zurück.

Dabei werden die Werte der name-Attribute zu den Schlüsseln des zurückgegebenen Arrays und die Werte der content-Attribute zu deren Werten, sodass einfach die Standard-Arrayfunktionen verwendet werden können, um das gesamte Array zu durchlaufen oder auf einzelne Werte davon zuzugreifen. Sonderzeichen im Wert des name-Attributs werden mit '_' ersetzt, alle anderen Zeichen werden in Kleinbuchstaben konvertiert. Haben zwei Meta-Tags den gleichen Namen, wird nur der letzte zurückgegeben.

Gibt im Fall eines Fehlers false zurück.

Beispiele

Beispiel #2 Was get_meta_tags() zurückgibt

<?php
// Angenommen die genannten Tags stuenden auf www.example.com
$tags = get_meta_tags('http://www.example.com/');

// Beachten Sie, dass alle Keys kleingeschrieben werden
// und dass . im Schluessel durch _ ersetzt wird.
echo $tags['author'];       // name
echo $tags['keywords'];     // php documentation
echo $tags['description'];  // a php manual
echo $tags['geo_position']; // 49.33;-86.59
?>

Anmerkungen

Hinweis:
Nur Meta-Tags mit name-Attributen werden geparst. Anführungszeichen sind nicht erforderlich.

Siehe auch

htmlentities() - Wandelt alle geeigneten Zeichen in entsprechende HTML-Entities um
urlencode() - URL-kodiert einen String

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 9 notes

down

bobble bubble ¶

10 years ago

This regex gets meta tags independent of sequence by capturing inside a lookahead.
Further uses the branch reset feature for different quote styles of values.
The pattern can be tested here: https://regex101.com/r/oE4oU9/1

<?PHP

function getMetaTags($str)
{
  $pattern = '
  ~<\s*meta\s

  # using lookahead to capture type to $1
    (?=[^>]*?
    \b(?:name|property|http-equiv)\s*=\s*
    (?|"\s*([^"]*?)\s*"|\'\s*([^\']*?)\s*\'|
    ([^"\'>]*?)(?=\s*/?\s*>|\s\w+\s*=))
  )

  # capture content to $2
  [^>]*?\bcontent\s*=\s*
    (?|"\s*([^"]*?)\s*"|\'\s*([^\']*?)\s*\'|
    ([^"\'>]*?)(?=\s*/?\s*>|\s\w+\s*=))
  [^>]*>

  ~ix';
  
  if(preg_match_all($pattern, $str, $out))
    return array_combine($out[1], $out[2]);
  return array();
}

// usage
$meta_tags = getMetaTags($str);

?>

down

richard dot dern at athaliasoft dot fr ¶

12 years ago

I personally experienced less issues using the DOM functions than regular expressions while trying to fetch meta tags and not using get_meta_tags function (in order to get http-equiv meta tags too).

<?php

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);

$nodes = $xpath->query('//head/meta');

foreach($nodes as $node) {
    [...]
}

?>

down

Ebpo ¶

13 years ago

Be aware that the function looks for the metatags in the whole page. If one of the meta is commented in your code for some reason, it will still be grabed.

down

rehfeld ¶

21 years ago

in response to
jp at webgraphe dot com

this function grabs meta tags, not http headers

if you need the headers

<?php

$fp = fopen('http://example.org/somepage.html', 'r');

// the variable $http_response_header magically appears
print_r($http_response_header);

// or
$meta_data = stream_get_meta_data($fp);
print_r($meta_data);

?>

down

richard at pifmagazine dot com ¶

26 years ago

An Important Note about META tags and this function :  if your META tag contains newline "\n"  characters, get_meta_tags() will return a NULL value for that name property.  Removing the newlines from the source META tag corrects the problem.

down

mariano at cricava dot com ¶

20 years ago

Based on Michael Knapp's code, and adding some regex, here's a function that will get all meta tags and the title based on a URL. If there's an error, it will return false. Using the function getUrlContents(), also included, it takes care of META REFRESH re-directions, following up to the specified number of redirections. Please note that the regular expressions included were split into strings because php.net was complaining about the line being to long ;)

<?php
function getUrlData($url)
{
    $result = false;
    
    $contents = getUrlContents($url);

    if (isset($contents) && is_string($contents))
    {
        $title = null;
        $metaTags = null;
        
        preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );

        if (isset($match) && is_array($match) && count($match) > 0)
        {
            $title = strip_tags($match[1]);
        }
        
        preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
        
        if (isset($match) && is_array($match) && count($match) == 3)
        {
            $originals = $match[0];
            $names = $match[1];
            $values = $match[2];
            
            if (count($originals) == count($names) && count($names) == count($values))
            {
                $metaTags = array();
                
                for ($i=0, $limiti=count($names); $i < $limiti; $i++)
                {
                    $metaTags[$names[$i]] = array (
                        'html' => htmlentities($originals[$i]),
                        'value' => $values[$i]
                    );
                }
            }
        }
        
        $result = array (
            'title' => $title,
            'metaTags' => $metaTags
        );
    }
    
    return $result;
}

function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
    $result = false;
    
    $contents = @file_get_contents($url);
    
    // Check if we need to go somewhere else
    
    if (isset($contents) && is_string($contents))
    {
        preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
        
        if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
        {
            if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
            {
                return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
            }
            
            $result = false;
        }
        else
        {
            $result = $contents;
        }
    }
    
    return $contents;
}
?>

Here's an example of its usage. Check that the included URL has a META REFRESH redirection:

<?php
$result = getUrlData('http://www.marianoiglesias.com.ar/');

echo '<pre>'; print_r($result); echo '</pre>';

?>

For the above code the output would be:

<?php
Array
(
    [title] => Mariano Iglesias: El Eternauta    
    [metaTags] => Array
        (
            [description] => Array
                (
                    [html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." />
                    [value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well.
                )

            [DC.title] => Array
                (
                    [html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" />
                    [value] => Mariano Iglesias - Weblog
                )

            [ICBM] => Array
                (
                    [html] => <meta name="ICBM" content="-34.6017, -58.3956" />
                    [value] => -34.6017, -58.3956
                )

            [geo.position] => Array
                (
                    [html] => <meta name="geo.position" content="-34.6017;-58.3956" />
                    [value] => -34.6017;-58.3956
                )

            [geo.region] => Array
                (
                    [html] => <meta name="geo.region" content="AR-BA">
                    [value] => AR-BA
                )

            [geo.placename] => Array
                (
                    [html] => <meta name="geo.placename" content="Buenos Aires">
                    [value] => Buenos Aires
                )

        )

)
?>

down

LWC ¶

11 years ago

New version based on mariano at cricava dot com's work with:
1) Support for Meta properties (like Facebook's og tags).
2) Support for Unicode (UTF-8) encoded Meta lines.
3) An option not to convert htmlentities - if you plan to actually use the results and not just display them.

function getUrlData($url, $raw=false) // $raw - enable for raw display
{
    $result = false;
   
    $contents = getUrlContents($url);

    if (isset($contents) && is_string($contents))
    {
        $title = null;
        $metaTags = null;
        $metaProperties = null;
       
        preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );

        if (isset($match) && is_array($match) && count($match) > 0)
        {
            $title = strip_tags($match[1]);
        }
       
        preg_match_all('/<[\s]*meta[\s]*(name|property)="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match);
       
        if (isset($match) && is_array($match) && count($match) == 4)
        {
            $originals = $match[0];
            $names = $match[2];
            $values = $match[3];
           
            if (count($originals) == count($names) && count($names) == count($values))
            {
                $metaTags = array();
                $metaProperties = $metaTags;
                if ($raw) {
                    if (version_compare(PHP_VERSION, '5.4.0') == -1)
                         $flags = ENT_COMPAT;
                    else
                         $flags = ENT_COMPAT | ENT_HTML401;
                }
               
                for ($i=0, $limiti=count($names); $i < $limiti; $i++)
                {
                    if ($match[1][$i] == 'name')
                         $meta_type = 'metaTags';
                    else
                         $meta_type = 'metaProperties';
                    if ($raw)
                        ${$meta_type}[$names[$i]] = array (
                            'html' => htmlentities($originals[$i], $flags, 'UTF-8'),
                            'value' => $values[$i]
                        );
                    else
                        ${$meta_type}[$names[$i]] = array (
                            'html' => $originals[$i],
                            'value' => $values[$i]
                        );
                }
            }
        }
       
        $result = array (
            'title' => $title,
            'metaTags' => $metaTags,
            'metaProperties' => $metaProperties,
        );
    }
   
    return $result;
}

function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
    $result = false;
   
    $contents = @file_get_contents($url);
   
    // Check if we need to go somewhere else
   
    if (isset($contents) && is_string($contents))
    {
        preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
       
        if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
        {
            if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
            {
                return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
            }
           
            $result = false;
        }
        else
        {
            $result = $contents;
        }
    }
   
    return $contents;
}
?>

<?php
$result = getUrlData('http://whatever...', true);

echo '<pre>'; print_r($result, true); echo '</pre>';

?>

Output example:

<?php
Array
(
    [title] => The requested page's title
    [metaTags] => Array
        (
            [description] => Array
                (
                    [html] => <meta name="description" content="Something..." />
                    [value] => Something...
                )
        )
    [metaProperties] => Array
        (
            [og:type] => Array
                (
                    [html] => <meta property="og:type" content="article"/>/>
                    [value] => article
                )
        )
)
?>

down

roganty at gmail dot com ¶

19 years ago

This is a slight amendment to jimmyxx at gmail dot com function

I tried using the regex displayed in his code, and php threw up a couple of errors

Below is the correct regular expression that works
(Please note that I had to split the regex into strings because php.net was complaining about the line being to long)
<?php
preg_match_all(
   "|<meta[^>]+name=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]+>|i",
   $html, $out,PREG_PATTERN_ORDER);
?>

The problem was due to the quotes being incorrectly escaped.
I hope this helps anyone who has been having problems with his code

down

jp at webgraphe dot com ¶

22 years ago

If the URL is doing a redirection using the headers (like you would do with PHP function header("Location: URL");), the page has no content (in general). It appears get_meta_tags() doesn't catch that kind of redirection (like cURL would do) and it lead me to a timeout of my script.

I experienced this in a spider I wrote in order to feed my database of all available pages on my site and one link was linking to a page that simply has the following code:

<?php
  header("Location: sections.php?section=home");
  exit();
?>

That made my script hang for a moment and apparently, get_meta_tags() wasn't even able to return me an error.

JP.

＋add a note