Centralized Log analysis & Logging in JSON – PART 1

Centralized Log analysis (Real Time) & Logging in JSON – PART 1

Logs are one of the most useful things when it comes to analysis; in simple terms Log analysis is making sense out of system/app-generated log messages (or just LOGS). Through logs we get insights into what is happening into the system.

You may call logs as the footprint generated by any activity with the system/app.

In the current context: app is web application and logs include web logs and app logs.

Centralized Logging:-
need for Centralized Logging is quiet important nowadays due to:-
– growth in number of applications,
– distributed architecture (Service Oriented Architecture)
– Cloud based apps
– number of machines and infrastructure size is increasing day by day.

This means that centralized logging and the ability to spot errors in a distributed systems & applications has become even more “valuable” & “needed”.
And most importantly
– be able to understand the customers and how they interact with websites;
– Understanding Change: whether using A/B or Multivariate experiments or tweak / understand new implementations.

Need for standardization:-

Current State
Developers assume that the first level consumer of a log message is a human and they only know what information is needed to debug an issue.
Logs are not just for humans!
The primary consumers of logs are shifting from humans to computers. This means log formats should have a well-defined structure that can be parsed easily and robustly.
Logs change!
If the logs never changed, writing a custom parser might not be too terrible. The engineer would write it once and be done. But in reality, logs change.
Every time you add a feature, you start logging more data, and as you add more data, the printf-style format inevitably changes. This implies that the custom parser has to be updated constantly, consuming valuable development time.

Current State
Logging is done on individual application servers; making it very harder to consolidate for lookups; usually done now through SSH/tentakel scripts and is very hard to maintain / use for different searches / lookup and ad hoc analysis.
Shell parsing scripts usually use cat/tail/awk/sed and other complex operations iterating over huge chunk of Lines again and again without providing much flexibility for ad hoc analysis.

Suggested Approach

Logging in JSON Format and Centralized Logging:

Just to keep it simple and generic for any webapp the approach recommended using is to {Key: Value} , JSON Log Format (structured/semi-structured).
This approach will be helpful for easy parsing and consumption, which would be irrespective of whatever technology/tools we choose to use!
Also, by this we don’t’ have to put most of the complex and expensive regular expression to parse it. So, better for log formats is to simply emit them in a structured format from the “application itself”. This will reduce any extra parsing in the future too!

JSON logging gives you the ability to parse the log file programmatically even if the format has changed in time. Developer-friendly formats like JSON are readable by humans and machines.

JSON has a couple of advantages over other “structures”.
Widely adopted: Most engineers know what JSON is, and there is a JSON library for every language usable. This means there is little overhead to parse logs.
Readable: Readability counts because engineers have to jump in and read the logs if there is a problem. JSON is text-based (as opposed to binary-based) and its format is a subset of JavaScript object literal (which most engineers are familiar with). In fact, properly formatted JSON is easier to read than logs formatted in ad hoc ways.

JSON libraries: JavaScript, PythonRubyJavaPerl.

1. Centralized Logging Server with Web Interface

Centralized Log Management is very important now and will play key role in both operational excellence and complete Visibility at any organization to be able to “access” and “analyze” log data easily.
So, 3 Key things for Centralized Log Management would be:-
1. Collection (Event Collector) & Log Filtering
2. Indexing & Searching
3. Reporting & Visualizations

2. Producer / JSON Format

Instrument your application code to generate message in the below formats!

2.1 JSON Message from Client Side
at Client side using JavaScript instrumentation to be able to generate a message in a format like:-

{
"timestamp": "2012-12-14T02:30:18",
"facility": "clientSide",
"clientip": "123.123.123.123",
"domain": "www.example.com",
"server": "abc-123",
"request": "/page/request",
"pagename": "funnel:example com:page1",
"searchKey": "1234567890_",
"sessionID": "11111111111111",
"event1": "loading",
"event2": "interstitial display banner",
"severity": "WARN",
"short_message": "....meaning short message for aggregation...",
"full_message": "full LOG message",
"userAgent": "...blah...blah..blah...",
"RT": 2
}

Add / emit all necessary data using the Keys like event1, 2, 3 etc ; whatever we want to measure and analyze.

2.2 JSON Message on Application Layer

Python example: https://github.com/madzak/python-json-logger

Nodejs example: https://github.com/trentm/node-bunyan

On Application layer you have many more metrics and values; that can be added to the message like:-


{
"timestamp": "2012-12-14T02:30:18",
"facility": "tomcat.example.application.app.ui",
"clientip": "123.123.123.123",
"domain": "www.example.com",
"method": "GET",
"status": 200,
"server": "abc-123",
"request": "/page/request.do",
"pagename": "funnel:example com:page1",
"searchKey": "1234567890_",
"sessionID": "11111111111111",
"event1": "Click",
"event2": "Click on page element 1",
"severity": "ERROR",
"short_message": "....meaning short message for aggregation...",
"full_message": "full LOG message",
"userAgent": "...blah...blah..blah...",
"RT": 400
}

Please make sure that the application is emitting the data needed to quantify your user’s behavior and/or information you need to able to analyze and measure things accurately.

 

3. Transportation

3.1 Transporting Logs Messages from Client Side

To be able to transfer the log message / web metrics data from the client to the centralized logging server:-
Use 1×1 GIF/PNG image (also called a web beacon) as a GET request from the client browser.
Domain where to make call is:-
On HTTP(s) pages: http(s)://logsmetrics.example.com/app1/?msg=<encoded message>
Notes:
a. msg should be Base64 encoded JSON encoded dictionary/associative array

3.2 Transporting Logs Messages from Application Server
Protocol: UDP
Logging: Syslog
Syslog-Host: syslog-logmetric.example.com
Port: 514
As most of the applications at example are written in Java and .Net; we can use the popularly used logging libraries like log4j for Java and log4net/NLog for .Net apps
For Java apps use the log4j appender to send the UDP messages to syslog server like

log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=syslog-lmetric.example.com
log4j.appender.SYSLOG.layout = org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern = %d [%t] %-5p %c- %m%n
log4j.appender.SYSLOG.Facility=Local3
log4j.appender.SYSLOG.FacilityPrinting=true
For .Net based applications use the log4net or NLog appender to send UDP messages to the syslog server like for log4net:-

<appender name=”UdpAppender” type=”log4net.Appender.UdpAppender”>
<param name=”RemoteAddress” value=”syslog-lmetric.example.com” />
<param name=”RemotePort” value=”514″ />
<layout type=”log4net.Layout.PatternLayout, log4net”>
<conversionPattern value=”%-5level %logger [%property{NDC}] – %message%newline” />
</layout>
</appender>

4. Collecting, Indexing, Searching

Note: Tested implementation using Logstash

4.1 Collector

Options: –
a. Syslog-NG as the collector!

Details: http://52.57.91.237/2009/06/23/centralized-logging-using-syslog-ng-splunk-indexing-search/
b. alternatively Rsyslog can be used in place of syslog-ng!

c. Also other collectors could be Graylog2, Logstash, and Fluentd

 

4.2 Logstash

Configure Logstash to have Centralized Setup with Event Parsing

Details: http://logstash.net/docs/1.1.10/tutorials/getting-started-centralized

Implementation with config details @ http://www.vmdoh.com/blog/centralizing-logs-lumberjack-logstash-and-elasticsearch

logstash
logstash

 

 

 

 

 

4.3 Tracking Server: Request Logging
Server: Ngnix

App: Nodejs

Base64 decoding

https://github.com/kvz/phpjs/blob/master/functions/url/base64_decode.js
4.4 Filter
Applying filter like date, dns, grep, grok, json, split etc. on the log messages received. For details look at Cookbook of Logstash : http://cookbook.logstash.net/
4.5 Indexing – ElasticSearch
storing log messages in ElasticSearch for Indexing as it is Distributed, RESTful, and Search Engine built on top of Apache Lucene.

  • Node: an elasticsearch instance running (a java process). Usually every node runs on its own machine.
  • Cluster: one or more nodes with the same cluster name.
  • Index: more or less like a database.
  • Type: more or less like a database table.
  • Shard: effectively a lucene index. Every index is composed of one or more shards. A shard can be a primary shard (or simply shard) or a replica.

4.6 Web Interface
Search Interface:
A. Use Kibana: http://kibana.org/  (Search | Graph | Score | Stream)
B. Logstash Web: http://logstash:9292/search (Not much of the use if using Kibana)
Automated reporting of real time events / messages both from client side and server side!

C. Visualization and Trending on the counters from logs can be developed using Graphite: http://graphite.wikidot.com/.

5. Availability
<TBD>

6. Assumptions & Philosophy
Key things to keep in mind/ Rules:-

A. Use timestamps for every event
B. Use unique identifiers (IDs) like Transaction ID / User ID / Session ID or may be append unique user Identification (UUID) number to track unique users.
C. Log in text format / means Avoid logging binary information!
D. Log anything that can add value when aggregated, charted, or further analyzed.
E. Use categories: like “severity”: “WARN”, INFO, WARN, ERROR, and DEBUG.
F. The 80/20 Rule: %80 or of our goals can be achieved with %20 of the work, so don’t log too much 🙂
G. NTP synced same date time / timezone on every producer and collector machine(#ntpdate ntp.example.com).

H. Reliability: Like video recordings … you don’t’ want to lose the most valuable shoot … so you record every frame and then later during analysis; you may throw away rest of the stuff…picking your best shoot / frame. Here also – logs as events are recorded & should be recorded with proper reliability so that you don’t’ lose any important and usable part of it like the important video frame.

 

References:

Many articles / GOOG

http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/

http://blog.treasure-data.com/post/21881575472/log-everything-as-json-make-your-life-easier

http://blog.nodejs.org/2012/03/28/service-logging-in-json-with-bunyan/

rubyrep : master-mater replication PostgreSQL

rubyrep Database replication that doesn’t hurt.

Unlike Oracle & MySQL : PostgreSQL doesn’t’ have built in replication solutions but there are many other replication solutions available for PostgreSQL liked listed here :-

http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling

and some additional proprietary solutions for the custom needs by different companies.

Mostly people use Slony : http://www.slony.info/ for master – slave replication solution – Slony is a “master to multiple slaves” replication system supporting cascading (e.g. – a node can feed another node which feeds another node…) and failover.

but Slony has limitations as mentioned here :

  • Replicated tables must have a unique or primary key
  • It does not support replication of large objects
  • Schema changes are not propagated (though they can be coordinated)
  • It does not support synchronizing databases outside of replication
  • There are limitations on version compatibility; you can not replicate from PostgreSQL 8.2 to PostgreSQL 8.4 for example
  • It is more difficult to set up than many other replication solutions

There are many new replication and clustering solutions being there but most of them in development phases only.

To provide mater-master replication in PostgreSQL – mostly used solutions are :-

Bucardo

RubyRep

and RubyRep is most easy to setup and configure.

RubyRep Mission:-

Development of an open-source solution for asynchronous, master-master replication of relational databases that is

  • ridiculously easy to use
  • database independent

It currently supports PostgreSQL and MySQL and is currently developed by Arndt Lehmann,. He also provides great support to the RubyRep mailing list, especially for adding new features or fixing bugs.

RubyRep always operates on two databases. To make it simple to understand, the databases are referred to as “left” and “right” database respectively.

RubyRep’s key features includes:

  • Simple configuration, complete setup can be done via single configuration file.
  • Simple Installation, if you have a JVM installed, then you just have to download and extract the files.
  • Platform Independent, it runs on Unix and Windows platform.
  • Table Design Independent, meaning that all commands work on tables no matter if they have a simple primary key (all data types acceptable), a combined primary key, or no primary key at all. It successfully processes multi-byte texts and “big” data types
  • It replicates tsvector datatype

In addition to the above, RubyRep actually provides three tools in one; a Compare, Sync, and Replication tools.

Compare

This tool scans corresponding tables of left and right database, looking for diverging data. Key features of the comparison tool are:

  • Different output modes, from a count of differences to full row dumps.
  • Low bandwidth mode available, reducing the number of round-trips so only actual differences go through the network.
  • A progress bar with estimated remaining amount of work.
  • Server load is targeted toward only the “right” database server.

Sync

The sync tool is used to synchronize data in corresponding tables of a left and right pair of databases. Key features of the sync tool are:

  • All features of the Compare tool also apply to syncs
  • Automatically orders table syncs to avoid foreign key conflicts.
  • You can configure the Sync policy to ignore deletes in left database, or to ignore creating records in right database, and other such combinations
  • Provides two prebuilt conflict resolution methods, either left db wins or right db wins
  • Custom conflict resolution methods specifiable via ruby code snippets
  • Merge decisions can optionally be logged in the rubyrep event log table.

Replicate

Of course RubyRep also provides a replication tool. Some of the key features of the replication tool include:

  • Automatically sets up all necessary triggers, log tables, etc.
  • Automatically discovers newly added tables and synchronizes the table content
  • Automatically reconfigures sequences to avoid duplicate key conflicts
  • Tracks changes to primary key columns
  • Can implement either master-slave or master-master replication
  • Prebuilt conflict resolution methods available include left or right wins, or earlier, later change wins
  • Custom conflict resolution specifiable via ruby code snippets
  • Replication decisions can optionally be logged in the rubyrep event log table

One of the problems common to replication solutions is that of setting up new nodes. With Slony, there are always some headaches caused by high load on master database server, as a result of the TRUNCATE/COPY cycle Slony goes through. In the case of RubyRep, most of the CPU load is on the slave server, and you can use the Sync command in advance before you start replicating database. RubyRep also provides some flexibility to ignore the Sync commands if you don’t want to sync the database again.

For installation refer to http://www.rubyrep.org/installation.html

Help

    # rubyrep --help
    Usage: /usr/local/bin/rubyrep [general options] command [parameters, ...]
    
    Asynchronous master-master replication of relational databases.
    
    Available options:
     --verbose                    Show errors with full stack trace
     -v, --version                    Show version information.
     --help                       Show this message
    
    Available commands:
     generate        Generates a configuration file template
     help            Shows detailed help for the specified command
     proxy           Proxies connections from rubyrep commands to the database
     replicate       Starts a replication process
     scan            Scans for differing records between databases
     sync            Syncs records between databases
     uninstall       Removes all rubyrep tables, triggers, etc. from "left" and "right" database
    
  • Generate configuration file
  •   #rubyrep generate myrubyrep.conf

Checkout http://www.rubyrep.org/tutorial.html for scanning & sync & replication configs

Ref: Denish Patel Blog Post