Digging through log files has got to be the least-sexy aspect of web app development. I’ve found that using MySQL makes this process a little less painful and easier to standardize. At the very least, you don’t end up with 5 perl scripts every time you try to analyze a particular piece of data that you can never reuse. Here’s the workflow that I’ve found to work well for me:
- Grab logs with your preferred method (
scp
,wget
,curl
you name it) - Select the columns you want out of the log file using
cut
. If your input is not Tab-delimited, you will have to specify your delimiter with-d
and override your output delimiter with--output-delimiter="\t"
. For example, to select the first and third columns out of a CSV file:cut -d, -f1,3 --output-delimiter="\t"
. - Create a new MySQL table that has a column per log line column. For example: `mysql -e “create table pageviews (page varchar(32), cputime int)” logs
- Import the data from the text file:
mysqlimport --local pageviews.txt
Please note: the table that you are importing into needs to have the same name as the text file. So if the file is called httperrors.txt
your MySQL table needs to be called httperrors
.
That’s about it. Now you can use familiar SQL to run any analysis on your data that you want.
For example: SELECT page, COUNT(1) FROM log WHERE cputime > 500 GROUP BY page ORDER BY 2 DESC;
That will give me all the requests that took longer than 500 ms of CPU time broken down by page in my app. As simple as that.
How is that for data warehousing?
- pims liked this
- paksoy-blog posted this