“Measurement is fabulous. Unless you're busy measuring what's easy to measure as opposed to what's important” –Seth Godin.
So, what are these really important things to measure? Being a software enthusiast, I think a system’s or application’s performance is one thing which you cannot ignore. In general, we tend to measure at three levels: network, machine, and application level. Application metrics are mostly the hardest and the most important, of the three. Application metrics are very specific to your business, and they change as your applications change.
In this blog, I will be targeting the User Datagram Protocol (UDP) based containerized metric collector and tracing systems - StatsD and Jaeger with sample application integration.
Using the UDP protocol has some perks like, while tracing the application, the UDP protocol will not slow down the application. The UDP works on “fire and forget principle”. Your application tracing systems may or may not get the data and irrespective of it the application will stay up and running. Even if the stats are awry, it won’t affect the system or application’s performance.
StatsD is a front-end proxy for the Graphite/Carbon metrics server. The first implementation was developed with node.
You can install from PyPI:
$ pip install statsd
A typical StatsD integration with application looks like:
Fig - StatsD integration with application.
The configuration includes following components:
- StatsD Client - This is used to collect the metrics from the application. The client library injected to application and statsd injected code starts collecting the metrics and sends it to the server for processing.
- StatsD Server - StatsD client sends the data to StatsD server. StatsD Server parses the messages, extracts metrics data, and periodically flushes the data to graphite.
- Graphite - This is used for visualization purpose along with Carbon that is a high-performance service that listens to time-series data. StatsD Server sends the parsed data to Graphite. The graphite visualizes the metrics in different graphs.
Graphite is very easy to use and has very powerful graphing and data manipulation capabilities. Most importantly for StatsD, you can create new metrics in graphite just by sending it data for that metric. That means there’s no management overhead for engineers to start tracking something new. You can simply tell StatsD that you want to track “grue.dinners” and it’ll automatically appear in graphite. (By the way, because we flush data to graphite every 10 seconds, our StatsD metrics are nearly real-time.)
Here’s how we can do it using our Python StatsD library:
1. Track the counters:
from statsd import StatsClient // create statsd object
statsd = StatsClient(host=os.getenv('STATSD_SERVER', 'localhost'), port= os.getenv('STATSD_SERVER_PORT', 8125), prefix=None, maxudpsize=512) // send metric
The above line of code will create a new counter on the fly and increment it every time it’s executed.
You can then go have a look at your graph and check how many times the page has been requested.
Fig - sample graph generated in graphite.
Graphite visualizes the data received from statsD server in a meaningful way in the form of different graphs. Above is the graph for a webpage that shows how much time (in milliseconds) a particular page is accessed by users from different geographies during given time.
2. Track execution times:
// create timer foo_timer = statsd.timer('ShowSignUpTime') // start the timer foo_timer.start() // call showSignUp page ShowSignUp() // stop the timer foo_timer.stop()
The above code snippet will help you calculate the execution time as a part of the showSignUp() page.
We’re measuring the execution times as a part of our search infrastructure.
You can download the python source code from this repo here:
Using StatsD to monitor the Java-based application including jvm profiler and mysql.
If you are wondering - how to integrate your application with StatsD, here is a link that will help you understand StatsD integration with Java application:
Jaeger makes it easy to monitor complex, microservice-based architectures by using consistent upfront sampling with individual per service/endpoint probabilities. It is capable of monitoring distributed context propagation, distributed transaction monitoring, root cause analysis, service dependency analysis, and performance and latency optimization.
Microservices ecosystem brings its own challenges such as loss of visibility into the system and the complex interactions between services. Traditional monitoring tools such as metrics and distributed logging still have their place, but they often fail to provide visibility across services. This is where distributed tracing comes.
OpenTracing is a new, open standard for instrumenting applications and OSS packages for distributed tracing and monitoring. If you are looking to monitor a microservices-based architecture and perform root cause analysis of performance issues OpenTracing is just the thing for you. Here is a link to setup Jaeger (on Docker container) with HotRod Go application:
Fig - Jaeger Integration with application.
This diagram exhibits the architecture of how you can integrate Jaeger with your application and shows the mechanism of Jaeger to collect events from the application server. The setup includes following components:
- Jaeger Client - Jaeger clients are language-specific implementations of the OpenTracing API.
- Jaeger Agent - A network daemon that listens to spans sent over UDP, which it batches and sends to the collector. It is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the collectors away from the client.
- Jaeger Collector - The collector receives traces from Jaeger agents and runs them through a processing pipeline.
- Jaeger UI - Web UI for Jaeger.
Keeping track of everything is a very important aspect in my opinion. And when it comes to tracking the performance of your application, which is an important aspect for any system, these two tools will certainly help you track what you need to track and that too without any hassle. It is very easy to instrument your applications with these two because of the rapport with almost any ecosystem. So, if you are looking for an end to end distributed system to trace and monitor your complex architectures, you know where and what to look for.