Apache2Piwik Exporter, version 1.2, January 2012 (github)
This software is released under GPL v3 license:
http://www.gnu.org/licenses/gpl-3.0.html
Description
Apache2Piwik is a script written in Python to enable exporting statistics from Apache logs to Piwik.
Requirements
- Access to Piwik 1.4, 1.5 or 1.6 Installation.
- Access to Apache logs with read rights.
- Python 2.6 with components:
- MySQLdb
- GeoIP for Python (http://www.maxmind.com/app/python)
- httpagentparser (http://pypi.python.org/pypi/httpagentparser/0.8.2)
Preparation
Before running the script please be sure to:
- Prepare `settings.py` configuration file – sample in settings.py.sample
- Create backup of your Piwik MySQL Database
Running
$ cd apache2piwik
$ python apache2piwik.py
Running with options
You can override ID_SITE, APACHE_LOG_FILES, CHRONOLOGICAL_ORDER, CONTINUE from settings.py
Examples:
$ python apache2piwik.py -c0 -b0 -f "log1.txt;log2.txt" -i3
$ python apache2piwik.py -c0 -b1 -f "log1.txt" -i3
Usage: apache2piwik.py [start|stop] [OPTIONS]
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-f FILES, --file=FILES apache log files, files names should be ;-seperated, (overrides APACHE_LOG_FILES)
-b {0 or 1} --chronological_order={0 or 1} if files are chronologicaly orderd, (overrides CHRONOLOGICAL_ORDER)
-c {0 or 1} --continue={0 or 1} if you want to run script more than one time on one set of data (overrides CONTINUE)
-i INT, --id_site=INT Piwik id site (overrides ID_SITE)
-r, --remove_cache Remove cache files
-g, --goal Create data for goals
Goals
If you add new goals after data are in database, run
$ python apache2piwik.py -g
to update the goals data in database. Remember to delete piwik_archive_blob and piwik_archive_numeric tables before viewing result in piwik.
Running daemon
First set LIFE variable in settings to True and FREQUENCY_OF_READING as you wish.
To start a daemon on log file run
$ python apache2piwik.py start
and to stop
$ python apache2piwik.py stop
Credits
The sponsor of the script is CLANMO, a German, award-winning mobile interactive agency from Köln.
Clearcode supports further development of Apache2Piwik and upcoming GUI and binaries for Windows/Linux & MacOS X.
Contact
Clearcode is a software development company. We offer consulting, IT development and configuration services for web analytics solutions, including Piwik.
Email: contact@clearcode.cc



I use awstats for traffic analysis (css, img, video etc), for the analysis of 404 errors and to analyze google crawl.
I can see the same statistics in piwik?
Unfortunately, this would need changes to the script. Piwik does not log 404 errors and also apache2piwik skips all lines of logs that have http status errors.
Hi, I just had a look into piwik and it is great. Even better that I just read this post. It is exactly what I was missing.
Really great.
It just raises one question: If I do an import every 24h with this script for a page that also has the JS Tracker code?
Now, this question might seem a liitle strange but actually it is not too much. I can easyly include the corresponding HTML/JS sniplet in wordpress but not in all other software. Since I wante to get the statistics of the whole page, I need to import the access.log of apache.
Do I get double counts as a result?
Cheers,
Dagobert
Thanks!
apache2piwik is designed for importing statistics of webpages that DO NOT have piwik js tracking code. If you run it on the site that has piwik js tracking code installed you may expect to double counts.
You can you apache2piwik for importing historical logs when you did not piwik tracking code.
Do you think it is somehow relatively easy to identify the doubles in your script. If so, I might have a closer look into it to suggest an implementation of that feature.
Otherwise getting rid of awstats or similar is not really possible. That would be really sad because piwik is really great and more friendly then awstats.
If wordpress resides under certain URL, it’s pretty easy to exclude those URLs in apache2piwik.conf file. Then apache2piwik will be importing only stats for all other URLs that do not have JS tracker installed.
Hi,
thank you for this script – but I get an error saying:
“UnboundLocalError: local variable ‘version’ referenced before assignment”
after 640k inserts into piwik_log_link_visit_action. Have you experienced a problem like this before? Would it help sending the complete Traceback?
Best regards
Torsten
I’m having this error too, it comes from the httpagentparser module. Did you find a solution?
Here is the traceback:
Traceback (most recent call last):
File “apache2piwik.py”, line 858, in
apache2piwik(DIR)
File “apache2piwik.py”, line 748, in apache2piwik
start = apache2piwik_process_line(cursor, line, start,g)
File “apache2piwik.py”, line 639, in apache2piwik_process_line
visitor = define_visit(match,line)
File “apache2piwik.py”, line 261, in define_visit
user_agent = httpagentparser.detect(visitor['user_agent_original'])
File “/usr/local/lib/python2.6/dist-packages/httpagentparser-0.9.6-py2.6.egg/httpagentparser/__init__.py”, line 277, in detect
if detector.detect(agent, result):
File “/usr/local/lib/python2.6/dist-packages/httpagentparser-0.9.6-py2.6.egg/httpagentparser/__init__.py”, line 68, in detect
version = self.getVersion(agent)
File “/usr/local/lib/python2.6/dist-packages/httpagentparser-0.9.6-py2.6.egg/httpagentparser/__init__.py”, line 185, in getVersion
return version.replace(‘_’, ‘.’)
UnboundLocalError: local variable ‘version’ referenced before assignment
Hi, release 0.9.7 of httpagentparser fixes the problem (at least in my case)
http://pypi.python.org/pypi/httpagentparser/
I installed Apache2Piwik, but when I import data, in the piwik I see visitors data only. Other piwik widget has no data. I’m using piwik 1.6. Someone has the same issue?
Thanks
Alex
Very nice tool – helps me a lot!! Thank you.
But on windows (server 2003) there is a problem with the Apache logs files paths. Although having full access for everyone I always get “[Errno 13] Permission denied”. If I run apache2piwik with -f option all works fine.
Best Regards
Christiane
Forget my previous comment. Imisunderstood the option in settings.py. I thought there may also be specified directories.
Best Regards
Christiane
New version released – compatible with Piwik 1.6.
Firstly, thank you for such a quick turnaround from my e-mail, great job!
I can get Apache2Piwik to start processing my log, but it then stops again with the same as before, i.e.:
Started processing /root/clearcode-Apache2Piwik-194b534/usr/local/apache/logs/access_log file…
Finished in 0m45s.
Traceback (most recent call last):
File “apache2piwik.py”, line 869, in
apache2piwik(DIR)
File “apache2piwik.py”, line 777, in apache2piwik
add_goals()
File “apache2piwik.py”, line 514, in add_goals
cursor.execute(‘SELECT lv.idvisit, lva.idsite, lv.idvisitor, lva.server_time, lva.idaction_url, lva.idlink_va, lva.server_time, lv.referer_type, lv.referer_name, lv.referer_keyword, lv.visitor_returning, lv.visitor_count_visits, lv.visitor_days_since_first, lv.location_country, lv.location_continent, la.name FROM ‘+s.PIWIK_PREFIX+’log_link_visit_action AS lva JOIN ‘+s.PIWIK_PREFIX+’log_visit AS lv ON (lva.idvisit=lv.idvisit) JOIN ‘+s.PIWIK_PREFIX+’log_action AS la ON (la.idaction=lva.idaction_url) WHERE idaction_url IN ‘+idactions+’ AND lva.idvisit NOT IN (SELECT idvisit FROM ‘+s.PIWIK_PREFIX+’log_conversion WHERE idgoal = %s)’,(int(goal[1]),))
File “build/bdist.linux-i686/egg/MySQLdb/cursors.py”, line 174, in execute
File “build/bdist.linux-i686/egg/MySQLdb/connections.py”, line 36, in defaulterrorhandler
_mysql_exceptions.ProgrammingError: (1064, “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘) AND lva.idvisit NOT IN (SELECT idvisit FROM piwik_log_conversion WHERE idgoal ‘ at line 1″)
So far I’ve tried both formats listed in cPanel’s httpd.conf file:
“%h %l %u %t \”%r\” %>s %b”
“%h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\”"
It does not accept the common/combined part, is it meant to?
If I keep running it, will it just overwrite the old data or double it?
There’ a bug when the table prefix is empty. Table names should be quoted (there’s a table called “option”).
will correct it in next release for Piwik 1.7
Stuck here, wondering what is the problem???
root > python apache2piwik.py
Traceback (most recent call last):
File “apache2piwik.py”, line 83, in
gi = pygeoip.GeoIP(‘lib/GeoIP.dat’, pygeoip.MEMORY_CACHE)
AttributeError: ‘module’ object has no attribute ‘GeoIP’
You need to install Python GeoIP (http://www.maxmind.com/app/python) – as stated in requirements.
root@piwik:~/clearcode-Apache2Piwik-194b534# python2.6 apache2piwik.py
None
Your have unsupported version of piwik: 1.7
is piwik 1.7 not supported or do i something wrong?
thx in adv
Apache2Piwik for 1.7.1 will be released soon.
Is it easy to make a “spin off” just to insert historical data from another stats tool? For me, only daily visits from the are important.
Example:
past2piwik [site_id] [yyyymmdd] [visits]
or
past2piwik -f [visitsfile]
Many thanks in advance
Past data is supported, but the exact timestamp is taken from the log file. No need for the additional date parameter…
To facilitate importing compressed logfiles, I altered the apache logfile open try block.
Most of our logfiles are bz2, so I did bz2 -> gzip -> regular open()
import bz2
import gzip
try:
# try to open a bz2 compressed file
try:
f = bz2.BZ2File(apache_log_file, ‘r’)
f.seek(1)
f.seek(0)
except IOError:
# maybe its a gzip file
try:
f = gzip.GzipFile(apache_log_file, ‘r’)
f.seek(1)
f.seek(0)
except IOError:
# ok, just open it, probably uncompressed
f = open(apache_log_file,’r')
except IOError, e:
print str(e).replace(‘[Errno 2] ‘,”)
sys.exit(0)