Five Years At MakeMyTrip – Journey Of A Hacker

After I completed my five years in MakeMyTrip.Com on 28th Sep 2013, an overwhelming retrospection was only natural. This is a diary entry that spans across 60 months and countless memories. Sitting on my desk, I’ve tried to put forth a not so dispassionate account of the people I met with, projects I worked on and processes I created. Some highpoints…

badge_pic_deskteam

Welcome to MakeMyTrip

My ‘journey’ here began in September 2008 as an Assistant Manager in the website operations team, a department that usually gets famous only by defaulting!

The first project was a (shared) responsibility to setup Infrastructure at MMT Colocation/Data Centre at Mumbai IDC (Internet Data Center). But a stronger memory is that of the THM (Town Hall Meet) when new entrants put together a stage act for everyone. My character was ‘Sansani’ reporter (chain se sona hai to jaag jao fame) who reported MMT’s sale to Haryana Government. Interviewing my teammates playing the roles of various Leadership Team members and interviewing Haryana Cattle and Agriculture Minister was not something that I’d get to do ever again.It was while preparing for this utopian dystopian parody that I made some amazing friends, most of which remain till this moment. The fun I had isn’t comparable to all the college fests put together.

A Techie’s Tale

This is clearly going to be the biggest part of this post and is best read with cuppa coffee.

I started on a project called Cheetah – Selenium Robot. Seeing NOC team manually monitor the health of makemytrip.com, I couldn’t resist from automating the health checks and functional monitoring using Selenium-RC and used VNC sessions to record using Pyvnc2swf for replay of tests as flash movie. As a foresight, Cheetah is something I’ll be working on to get it open-sourced after improvements.

This was followed by a series of projects on BizEye Sync Tool, MyClient: MySQL Web Client, BizEye Twiki (NOC Web portal), performance tuning, Exception Dashboard and implementation on various monitoring tools and migration of makemytrip.co.in to makemytrip.com.

A good thing about being in MakeMyTrip is that you get to work on business requirements and innovations in almost equal proportions. Being a mid-sized techno-travel startup in an emerging market means a lot of room for innovation. ‘Team Yoga’ – my one such project – saw me working with the best brains in MakeMyTrip. We worked on implementing the next generation architecture (now famous as MMT 3.0) guided by superb Leader The Amit Somani. Team Yoga and our tech-exploits taught me tricks and truths that refuse to leave my memory even after five years.

After Mumbai, next in the line of Colocation/Data Center was Chennai IDC. It was different only that this time we used best standards and solutions, followed by many memorable Hacknights wherein the team got closer, stronger. My colleagues Munish and Ravikant were the superstars.

Apache’ has become a sort of leitmotif in my world! Be it HTTPD, Tomcat, Apache Traffic Servers, Hadoop, Kafka etc. And F5 BigIP and the love for iRules – LTM (Local Traffic Manager) & GTM (Global Traffic Manager) - taught me the best of the load-balancing, availability & performance related stuff for Active-Active DR setup for two IDCs.

For payment card data security – PCI-DSS compliance and SOX assessment brought about changes in processes, applications and security procedures which later also helped company have smoother processes, faster approvals, and easier lives. God bless open source tools!

Being a NASDAQ listed company, MakeMyTrip has had its security woes and measures, which naturally came to my team. My first security revamp here included VAPT (Vulnerability Assessment and Penetration Testing), setup and configuration of IPS (Intrusion Prevention System), SIEM (Security Information and Event Management) and Web Application Firewall, which was one of our best investments in MMT’s Information Security.

AWeSOM time

Spreading awareness – that impacts everyone but indirectly – is always tough. It was this problem that gave birth to AWeSOM or Awareness Week on Security Online and Mobile (coined by Abhishek, our content chap). In this event I oversaw, we did a lot of stuff to educate folks (read writers, HR, admin, product, finance and legal teams) about technology, vulnerability, secure practices and oh so much more! The success of AWeSOM one led us organize AWeSOM 2, which was another hit. It will be best to read the company blogposts on AWeSOM1 and AWeSOM2 for more details and some implied learning.

awesom

 

 

 

As irony would have it, all the security measures I spearheaded in MakeMyTrip, my new designation was Information Security Manager.

Some Innovations

Business as usual, our team feels rather incomplete without innovating stuff. Which is why at various points of time, we created some amazing products for trippers. MyIdeas – an internal Idea collaboration portal – was one of the earliest. It saw anonymous ideas ranging from new products to grander offsites to richer coffee! And, we did implement :)

An organization chart to search employee in a directory format was quite the need of the hour back in time. It saved lot of time that was spent in hunting numbers and names!

The last such project I enjoyed is Trippervilla, the aesthetic intranet of MakeMyTrip where Trippers read news, updates, policies, access useful links and do much more, till today.

A techie’s tale will never be complete without Hackathons! I was a part of various Hackathons focused on innovation, business, technology and mobile. Some really quirky and stunning ideas were turned into reality overnight, and literally saw what we called ‘light of the day’. I think every company using technology should try this at least once, and see the awesome results that 24 sleepless hours can produce. But don’t forget to include coffee, Red-Bull and music :)

DevOps @ MMT & the Tools Arsenal

DevOps was pioneered to enable collaboration and alignment between developers and Ops team. We made use of various Open Source tools for automation and streamlined various processes for better communication and clarity between the teams.

Some tools we used / developed/ integrated like, for Configuration management – Puppet, NOCMATE-Rx/ DB Poller, Inventory Management System, Syslog-NG, Logstash, Revamping Monitoring Solution using Graphite, Neerikshan etc. Here’s its logo…

neerikshan

 

 

 

I was so excited with the product and logo, couldn’t resists sharing this on Google+!

The best part of the best-practices of DevOps is that they aren’t forever! This journey of recurring innovation still continues. Above all, it leaves me with ample free time for other pursuits :)

The number of workshops and conferences I attended is overwhelming! Here are some names I could think of…

  • DevOpsDays India – Started as an attendee and currently a humble speaker here
  • PyCon India – Here’s the workshop I and Konark delivered here on Celery Tool For Background Task Processing
  • OSSCamps: Helped organisers conduct the un-conference at various places
  • AWS Meetups, Startup Weekend, Barcamp
  • Tedx – Gurgaon, Delhi and Chandigarh events
  • Nullcon, Security Byte and OWASP InfoSec India Conference
devopsdays_2012

 

 

 

 

 

 

Luckily, I also got a chance to host a small DevOps Gurgaon Meetup at MMT.  Luckily again, Karanbir Singh: CentOS Project Lead was visiting India. On 12th July 2011, he presented on “CentOS: Beyond Distributed Engineering”. Check more details on DevOps Gurgaon Meetup.

The five years I spent here have made me a better professional, and not just a smarter techie. I’ve had the humble opportunity to train and mentor some teammates. Must say there’s no better way to learn than teaching.

Don’t know when or even how, but I got the tag of a problem solver, someone who gets the job done. These memories make me truly humbled, delighted and proud of every right or wrong thing I did at MakeMyTrip.

The Advent of Python at MakeMyTrip

The beginning of Python was when we started developing Python-coded tools like Cheetah, BigIP Ops Panel: LBManager to manage F5 BigIP LTM and GTM.

Then I met Konark Modi in OSSCamp (Chandigarh), and hired him in 2010. Once on board, Konark was on a Python spree. With a swiftness that still amazes all of us, Konark developed many applications built on Python stack. Some names on the top of my mind are NOCMATE-Rx (our monitoring solutions suite), Inventory Management, Dashboards, Small ETL framework (Pollers). But believe me, the list is much longer. Konark later elected as Vice president of PSSI for his contributions to Python Month & PyCon India.

hacker

 

Business Intelligence at MakeMyTrip

The bumpy ride to build DWH (Data Warehouse) & Business Intelligence (BI) started with 2013. After initial discussions, we started our reconnaissance with collecting use-cases of data requirements and analysing internal and external data sources. This understanding led to delivery of projects like LMetric (service for near real-time clickstream events and User behaviour analysis), BigData, NoSQL, Cassandra, and Hadoop.The most memorable part of SpagoBI was its implementation at Hoteltravel; I went to Phuket (Thailand) for 15 days. Besides learning and exposure, the entertainment and refreshment that trip presented are incomparable.

Today and Tomorrow

After roller-coaster ride of five years, I’m presently working on “Personalization Service and Recommendation Engine” and enjoying the fun of learning / implementing data services, Machine Learning algos and developing platforms for same.

Every company has folks who do good work, but not every company rewards them. Must say I am glad to be at MakeMyTrip. Following are the work-related awards that I’ve been humbled to receive here, for my contributions to technology:

  • STAR Performer – Twice in two years
  • MakeMyTrip’s Most Creative Team award
  • MakeMyTrip Mega Mind Award for Innovation

These obviously exclude awards or recognitions received in sports, conferences and office events.

Phew! If you’re with me till here, you must know dear reader, that I’m truly nostalgic seeing these five years flash by me in 1500 words. And I’m sure many more memories are in waiting as the journey continues.

Instead of picking high-points from About Us page of MakeMyTrip, I’d like to tell you why I love this place. I love it for:

  • Freedom
  • Flexibility
  • Love for open source
  • Scale and impact of technology
  • The playground that it is for a hacker to experiment, solve and learn

MakeMyTrip is truly A FIT RECIPE! I’m afraid only fellow trippers will know what that means :)

Disclaimer: The opinions, descriptions and other information given above are shared with an intent to narrate my story. It doesn’t seek to harm or offend any individual or organisation. Neither has it sought to share any sensitive information.

Centralized Log analysis & Logging in JSON – PART 1

Centralized Log analysis (Real Time) & Logging in JSON – PART 1

Logs are one of the most useful things when it comes to analysis; in simple terms Log analysis is making sense out of system/app-generated log messages (or just LOGS). Through logs we get insights into what is happening into the system.

You may call logs as the footprint generated by any activity with the system/app.

In the current context: app is web application and logs include web logs and app logs.

Centralized Logging:-
need for Centralized Logging is quiet important nowadays due to:-
- growth in number of applications,
- distributed architecture (Service Oriented Architecture)
- Cloud based apps
- number of machines and infrastructure size is increasing day by day.

This means that centralized logging and the ability to spot errors in a distributed systems & applications has become even more “valuable” & “needed”.
And most importantly
- be able to understand the customers and how they interact with websites;
- Understanding Change: whether using A/B or Multivariate experiments or tweak / understand new implementations.

Need for standardization:-

Current State
Developers assume that the first level consumer of a log message is a human and they only know what information is needed to debug an issue.
Logs are not just for humans!
The primary consumers of logs are shifting from humans to computers. This means log formats should have a well-defined structure that can be parsed easily and robustly.
Logs change!
If the logs never changed, writing a custom parser might not be too terrible. The engineer would write it once and be done. But in reality, logs change.
Every time you add a feature, you start logging more data, and as you add more data, the printf-style format inevitably changes. This implies that the custom parser has to be updated constantly, consuming valuable development time.

Current State
Logging is done on individual application servers; making it very harder to consolidate for lookups; usually done now through SSH/tentakel scripts and is very hard to maintain / use for different searches / lookup and ad hoc analysis.
Shell parsing scripts usually use cat/tail/awk/sed and other complex operations iterating over huge chunk of Lines again and again without providing much flexibility for ad hoc analysis.

Suggested Approach

Logging in JSON Format and Centralized Logging:

Just to keep it simple and generic for any webapp the approach recommended using is to {Key: Value} , JSON Log Format (structured/semi-structured).
This approach will be helpful for easy parsing and consumption, which would be irrespective of whatever technology/tools we choose to use!
Also, by this we don’t’ have to put most of the complex and expensive regular expression to parse it. So, better for log formats is to simply emit them in a structured format from the “application itself”. This will reduce any extra parsing in the future too!

JSON logging gives you the ability to parse the log file programmatically even if the format has changed in time. Developer-friendly formats like JSON are readable by humans and machines.

JSON has a couple of advantages over other “structures”.
- Widely adopted: Most engineers know what JSON is, and there is a JSON library for every language usable. This means there is little overhead to parse logs.
- Readable: Readability counts because engineers have to jump in and read the logs if there is a problem. JSON is text-based (as opposed to binary-based) and its format is a subset of JavaScript object literal (which most engineers are familiar with). In fact, properly formatted JSON is easier to read than logs formatted in ad hoc ways.

JSON libraries: JavaScript, PythonRubyJavaPerl.

1. Centralized Logging Server with Web Interface

Centralized Log Management is very important now and will play key role in both operational excellence and complete Visibility at any organization to be able to “access” and “analyze” log data easily.
So, 3 Key things for Centralized Log Management would be:-
1. Collection (Event Collector) & Log Filtering
2. Indexing & Searching
3. Reporting & Visualizations

2. Producer / JSON Format

Instrument your application code to generate message in the below formats!

2.1 JSON Message from Client Side
at Client side using JavaScript instrumentation to be able to generate a message in a format like:-

{
"timestamp": "2012-12-14T02:30:18",
"facility": "clientSide",
"clientip": "123.123.123.123",
"domain": "www.example.com",
"server": "abc-123",
"request": "/page/request",
"pagename": "funnel:example com:page1",
"searchKey": "1234567890_",
"sessionID": "11111111111111",
"event1": "loading",
"event2": "interstitial display banner",
"severity": "WARN",
"short_message": "....meaning short message for aggregation...",
"full_message": "full LOG message",
"userAgent": "...blah...blah..blah...",
"RT": 2
}

Add / emit all necessary data using the Keys like event1, 2, 3 etc ; whatever we want to measure and analyze.

2.2 JSON Message on Application Layer

Python example: https://github.com/madzak/python-json-logger

Nodejs example: https://github.com/trentm/node-bunyan

On Application layer you have many more metrics and values; that can be added to the message like:-


{
"timestamp": "2012-12-14T02:30:18",
"facility": "tomcat.example.application.app.ui",
"clientip": "123.123.123.123",
"domain": "www.example.com",
"method": "GET",
"status": 200,
"server": "abc-123",
"request": "/page/request.do",
"pagename": "funnel:example com:page1",
"searchKey": "1234567890_",
"sessionID": "11111111111111",
"event1": "Click",
"event2": "Click on page element 1",
"severity": "ERROR",
"short_message": "....meaning short message for aggregation...",
"full_message": "full LOG message",
"userAgent": "...blah...blah..blah...",
"RT": 400
}

Please make sure that the application is emitting the data needed to quantify your user’s behavior and/or information you need to able to analyze and measure things accurately.

 

3. Transportation

3.1 Transporting Logs Messages from Client Side

To be able to transfer the log message / web metrics data from the client to the centralized logging server:-
Use 1×1 GIF/PNG image (also called a web beacon) as a GET request from the client browser.
Domain where to make call is:-
On HTTP(s) pages: http(s)://logsmetrics.example.com/app1/?msg=<encoded message>
Notes:
a. msg should be Base64 encoded JSON encoded dictionary/associative array

3.2 Transporting Logs Messages from Application Server
Protocol: UDP
Logging: Syslog
Syslog-Host: syslog-logmetric.example.com
Port: 514
As most of the applications at example are written in Java and .Net; we can use the popularly used logging libraries like log4j for Java and log4net/NLog for .Net apps
For Java apps use the log4j appender to send the UDP messages to syslog server like

log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=syslog-lmetric.example.com
log4j.appender.SYSLOG.layout = org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern = %d [%t] %-5p %c- %m%n
log4j.appender.SYSLOG.Facility=Local3
log4j.appender.SYSLOG.FacilityPrinting=true
For .Net based applications use the log4net or NLog appender to send UDP messages to the syslog server like for log4net:-

<appender name=”UdpAppender” type=”log4net.Appender.UdpAppender”>
<param name=”RemoteAddress” value=”syslog-lmetric.example.com” />
<param name=”RemotePort” value=”514″ />
<layout type=”log4net.Layout.PatternLayout, log4net”>
<conversionPattern value=”%-5level %logger [%property{NDC}] – %message%newline” />
</layout>
</appender>

4. Collecting, Indexing, Searching

Note: Tested implementation using Logstash

4.1 Collector

Options: -
a. Syslog-NG as the collector!

Details: http://piyush.me/2009/06/23/centralized-logging-using-syslog-ng-splunk-indexing-search/
b. alternatively Rsyslog can be used in place of syslog-ng!

c. Also other collectors could be Graylog2, Logstash, and Fluentd

 

4.2 Logstash

Configure Logstash to have Centralized Setup with Event Parsing

Details: http://logstash.net/docs/1.1.10/tutorials/getting-started-centralized

Implementation with config details @ http://www.vmdoh.com/blog/centralizing-logs-lumberjack-logstash-and-elasticsearch

logstash

logstash

 

 

 

 

 

4.3 Tracking Server: Request Logging
Server: Ngnix

App: Nodejs

Base64 decoding

https://github.com/kvz/phpjs/blob/master/functions/url/base64_decode.js
4.4 Filter
Applying filter like date, dns, grep, grok, json, split etc. on the log messages received. For details look at Cookbook of Logstash : http://cookbook.logstash.net/
4.5 Indexing – ElasticSearch
storing log messages in ElasticSearch for Indexing as it is Distributed, RESTful, and Search Engine built on top of Apache Lucene.

  • Node: an elasticsearch instance running (a java process). Usually every node runs on its own machine.
  • Cluster: one or more nodes with the same cluster name.
  • Index: more or less like a database.
  • Type: more or less like a database table.
  • Shard: effectively a lucene index. Every index is composed of one or more shards. A shard can be a primary shard (or simply shard) or a replica.

4.6 Web Interface
Search Interface:
A. Use Kibana: http://kibana.org/  (Search | Graph | Score | Stream)
B. Logstash Web: http://logstash:9292/search (Not much of the use if using Kibana)
Automated reporting of real time events / messages both from client side and server side!

C. Visualization and Trending on the counters from logs can be developed using Graphite: http://graphite.wikidot.com/.

5. Availability
<TBD>

6. Assumptions & Philosophy
Key things to keep in mind/ Rules:-

A. Use timestamps for every event
B. Use unique identifiers (IDs) like Transaction ID / User ID / Session ID or may be append unique user Identification (UUID) number to track unique users.
C. Log in text format / means Avoid logging binary information!
D. Log anything that can add value when aggregated, charted, or further analyzed.
E. Use categories: like “severity”: “WARN”, INFO, WARN, ERROR, and DEBUG.
F. The 80/20 Rule: %80 or of our goals can be achieved with %20 of the work, so don’t log too much :)
G. NTP synced same date time / timezone on every producer and collector machine(#ntpdate ntp.example.com).

H. Reliability: Like video recordings … you don’t’ want to lose the most valuable shoot … so you record every frame and then later during analysis; you may throw away rest of the stuff…picking your best shoot / frame. Here also – logs as events are recorded & should be recorded with proper reliability so that you don’t’ lose any important and usable part of it like the important video frame.

 

References:

Many articles / GOOG

http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/

http://blog.treasure-data.com/post/21881575472/log-everything-as-json-make-your-life-easier

http://blog.nodejs.org/2012/03/28/service-logging-in-json-with-bunyan/