Apache2Piwik Exporter, version 1.2, January 2012 (github)

This software is released under GPL v3 license:
http://www.gnu.org/licenses/gpl-3.0.html

Description

Apache2Piwik is a script written in Python to enable exporting statistics from Apache logs to Piwik.

Requirements

Preparation

Before running the script please be sure to:

  • Prepare `settings.py` configuration file – sample in settings.py.sample
  • Create backup of your Piwik MySQL Database

Running

$ cd apache2piwik
$ python apache2piwik.py

Running with options

You can override ID_SITE, APACHE_LOG_FILES, CHRONOLOGICAL_ORDER, CONTINUE from settings.py

Examples:
$ python apache2piwik.py -c0 -b0 -f "log1.txt;log2.txt" -i3
$ python apache2piwik.py -c0 -b1 -f "log1.txt" -i3

Usage: apache2piwik.py [start|stop] [OPTIONS]

Options:
--version show program's version number and exit
-h, --help show this help message and exit
-f FILES, --file=FILES apache log files, files names should be ;-seperated, (overrides APACHE_LOG_FILES)
-b {0 or 1} --chronological_order={0 or 1} if files are chronologicaly orderd, (overrides CHRONOLOGICAL_ORDER)
-c {0 or 1} --continue={0 or 1} if you want to run script more than one time on one set of data (overrides CONTINUE)
-i INT, --id_site=INT Piwik id site (overrides ID_SITE)
-r, --remove_cache Remove cache files
-g, --goal Create data for goals

Goals

If you add new goals after data are in database, run

$ python apache2piwik.py -g

to update the goals data in database. Remember to delete piwik_archive_blob and piwik_archive_numeric tables before viewing result in piwik.

Running daemon

First set LIFE variable in settings to True and FREQUENCY_OF_READING as you wish.
To start a daemon on log file run

$ python apache2piwik.py start

and to stop

$ python apache2piwik.py stop

Credits

The sponsor of the script is CLANMO, a German, award-winning mobile interactive agency from Köln.

Clearcode supports further development of Apache2Piwik and upcoming GUI and binaries for Windows/Linux & MacOS X.

Contact

Clearcode is a software development company. We offer consulting, IT development and configuration services for web analytics solutions, including Piwik.

Email: contact@clearcode.cc

20 Responses to “Apache2Piwik – import apache logs into Piwik”

  1. Oleg 26. Jun, 2011 at 12:36 pm #

    I use awstats for traffic analysis (css, img, video etc), for the analysis of 404 errors and to analyze google crawl.
    I can see the same statistics in piwik?

    • Clearcode 27. Jun, 2011 at 12:49 pm #

      Unfortunately, this would need changes to the script. Piwik does not log 404 errors and also apache2piwik skips all lines of logs that have http status errors.

  2. Dagobert 26. Jun, 2011 at 12:39 pm #

    Hi, I just had a look into piwik and it is great. Even better that I just read this post. It is exactly what I was missing.
    Really great.

    It just raises one question: If I do an import every 24h with this script for a page that also has the JS Tracker code?
    Now, this question might seem a liitle strange but actually it is not too much. I can easyly include the corresponding HTML/JS sniplet in wordpress but not in all other software. Since I wante to get the statistics of the whole page, I need to import the access.log of apache.
    Do I get double counts as a result?

    Cheers,
    Dagobert

    • Clearcode 27. Jun, 2011 at 12:51 pm #

      Thanks!

      apache2piwik is designed for importing statistics of webpages that DO NOT have piwik js tracking code. If you run it on the site that has piwik js tracking code installed you may expect to double counts.

      You can you apache2piwik for importing historical logs when you did not piwik tracking code.

      • Dagobert 27. Jun, 2011 at 2:26 pm #

        Do you think it is somehow relatively easy to identify the doubles in your script. If so, I might have a closer look into it to suggest an implementation of that feature.

        Otherwise getting rid of awstats or similar is not really possible. That would be really sad because piwik is really great and more friendly then awstats.

        • Clearcode 27. Jun, 2011 at 5:11 pm #

          If wordpress resides under certain URL, it’s pretty easy to exclude those URLs in apache2piwik.conf file. Then apache2piwik will be importing only stats for all other URLs that do not have JS tracker installed.

  3. Torsten 15. Aug, 2011 at 1:43 pm #

    Hi,

    thank you for this script – but I get an error saying:

    “UnboundLocalError: local variable ‘version’ referenced before assignment”

    after 640k inserts into piwik_log_link_visit_action. Have you experienced a problem like this before? Would it help sending the complete Traceback?

    Best regards
    Torsten

    • Cri 09. Sep, 2011 at 1:35 pm #

      I’m having this error too, it comes from the httpagentparser module. Did you find a solution?

      Here is the traceback:

      Traceback (most recent call last):
      File “apache2piwik.py”, line 858, in
      apache2piwik(DIR)
      File “apache2piwik.py”, line 748, in apache2piwik
      start = apache2piwik_process_line(cursor, line, start,g)
      File “apache2piwik.py”, line 639, in apache2piwik_process_line
      visitor = define_visit(match,line)
      File “apache2piwik.py”, line 261, in define_visit
      user_agent = httpagentparser.detect(visitor['user_agent_original'])
      File “/usr/local/lib/python2.6/dist-packages/httpagentparser-0.9.6-py2.6.egg/httpagentparser/__init__.py”, line 277, in detect
      if detector.detect(agent, result):
      File “/usr/local/lib/python2.6/dist-packages/httpagentparser-0.9.6-py2.6.egg/httpagentparser/__init__.py”, line 68, in detect
      version = self.getVersion(agent)
      File “/usr/local/lib/python2.6/dist-packages/httpagentparser-0.9.6-py2.6.egg/httpagentparser/__init__.py”, line 185, in getVersion
      return version.replace(‘_’, ‘.’)
      UnboundLocalError: local variable ‘version’ referenced before assignment

    • Cri 09. Sep, 2011 at 5:08 pm #

      Hi, release 0.9.7 of httpagentparser fixes the problem (at least in my case)

      http://pypi.python.org/pypi/httpagentparser/

  4. Alex 25. Nov, 2011 at 12:26 am #

    I installed Apache2Piwik, but when I import data, in the piwik I see visitors data only. Other piwik widget has no data. I’m using piwik 1.6. Someone has the same issue?

    Thanks
    Alex

  5. Christiane 25. Nov, 2011 at 9:23 am #

    Very nice tool – helps me a lot!! Thank you.

    But on windows (server 2003) there is a problem with the Apache logs files paths. Although having full access for everyone I always get “[Errno 13] Permission denied”. If I run apache2piwik with -f option all works fine.

    Best Regards
    Christiane

    • Christiane 25. Nov, 2011 at 12:11 pm #

      Forget my previous comment. Imisunderstood the option in settings.py. I thought there may also be specified directories.

      Best Regards
      Christiane

  6. Clearcode 30. Jan, 2012 at 3:08 pm #

    New version released – compatible with Piwik 1.6.

    • Richard 30. Jan, 2012 at 6:24 pm #

      Firstly, thank you for such a quick turnaround from my e-mail, great job!

      I can get Apache2Piwik to start processing my log, but it then stops again with the same as before, i.e.:

      Started processing /root/clearcode-Apache2Piwik-194b534/usr/local/apache/logs/access_log file…
      Finished in 0m45s.
      Traceback (most recent call last):
      File “apache2piwik.py”, line 869, in
      apache2piwik(DIR)
      File “apache2piwik.py”, line 777, in apache2piwik
      add_goals()
      File “apache2piwik.py”, line 514, in add_goals
      cursor.execute(‘SELECT lv.idvisit, lva.idsite, lv.idvisitor, lva.server_time, lva.idaction_url, lva.idlink_va, lva.server_time, lv.referer_type, lv.referer_name, lv.referer_keyword, lv.visitor_returning, lv.visitor_count_visits, lv.visitor_days_since_first, lv.location_country, lv.location_continent, la.name FROM ‘+s.PIWIK_PREFIX+’log_link_visit_action AS lva JOIN ‘+s.PIWIK_PREFIX+’log_visit AS lv ON (lva.idvisit=lv.idvisit) JOIN ‘+s.PIWIK_PREFIX+’log_action AS la ON (la.idaction=lva.idaction_url) WHERE idaction_url IN ‘+idactions+’ AND lva.idvisit NOT IN (SELECT idvisit FROM ‘+s.PIWIK_PREFIX+’log_conversion WHERE idgoal = %s)’,(int(goal[1]),))
      File “build/bdist.linux-i686/egg/MySQLdb/cursors.py”, line 174, in execute
      File “build/bdist.linux-i686/egg/MySQLdb/connections.py”, line 36, in defaulterrorhandler
      _mysql_exceptions.ProgrammingError: (1064, “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘) AND lva.idvisit NOT IN (SELECT idvisit FROM piwik_log_conversion WHERE idgoal ‘ at line 1″)

      So far I’ve tried both formats listed in cPanel’s httpd.conf file:

      “%h %l %u %t \”%r\” %>s %b”

      “%h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\”"

      It does not accept the common/combined part, is it meant to?

      If I keep running it, will it just overwrite the old data or double it?

  7. vic 10. Feb, 2012 at 8:03 am #

    There’ a bug when the table prefix is empty. Table names should be quoted (there’s a table called “option”).

    • Clearcode 17. Feb, 2012 at 1:40 pm #

      will correct it in next release for Piwik 1.7

  8. Vincent 17. Feb, 2012 at 7:53 am #

    Stuck here, wondering what is the problem???

    root > python apache2piwik.py

    Traceback (most recent call last):
    File “apache2piwik.py”, line 83, in
    gi = pygeoip.GeoIP(‘lib/GeoIP.dat’, pygeoip.MEMORY_CACHE)
    AttributeError: ‘module’ object has no attribute ‘GeoIP’

Trackbacks/Pingbacks

  1. Export statistics from Apache log files to Piwik with Apache2Piwik – NEW! - Piwik - 22. Jun, 2011

    [...] Apache2Piwik, a script written in Python under GPL license, enables exporting statistics from Apache logs to Piwik! [...]

  2. Apache2piwik Howto - 03. Feb, 2012

    [...] bin ich von http://piwik.org/blog/2011/06/import-log-files-piwik/ http://clearcode.cc/offer/open-source-projects/apache2piwik/ vom zweiten Link ist auch die Source zu beziehen [...]