Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Tracing: From Theory to Practice

Distributed Tracing: From Theory to Practice

stellacotton

April 28, 2017
Tweet

More Decks by stellacotton

Other Decks in Programming

Transcript

  1. stella cotton | @practice_cactus 
 “A distributed system 
 is

    a collection of 
 independent computers 
 that appear to its users as a 
 single coherent system.” Andrew S. Tanenbaum and Maarten van Steen
 Distributed Systems: 
 Principles and Paradigms 
 Prentice Hall, Second Edition, 2007
  2. stella cotton | @practice_cactus User Orders Items Web Request Auth

    Process Ecommerce Process Still Rails, 
 but a new app Original app
  3. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Original app Orders Recommendations Billing Python????
  4. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing Why is this slow ???? Blame data science?
  5. stella cotton | @practice_cactus 
 “You can’t tell a 


    coherent macro story 
 about your application 
 by monitoring 
 individual processes” Ben Seligman
  6. stella cotton | @practice_cactus • Need lots of data •

    Delayed results • Can’t guarantee causality
  7. stella cotton | @practice_cactus def my_cool_system
 service_1
 service_2
 end
 


    def service_1
 Rails.logger "Service 1"
 execute_async_job
 end
 
 def execute_async_job
 Rails.logger "Async Job"
 end def service_2
 Rails.logger "Service 2"
 end 
01-01-2001 01:01:01 Service 1 01-01-2001 01:01:02 Async Job 01-01-2001 01:01:03 Service 2
 Aggregated Log
  8. stella cotton | @practice_cactus def my_cool_system
 service_1
 service_2
 end
 


    def service_1
 Rails.logger "Service 1"
 execute_async_job
 end
 
 def execute_async_job
 sleep 15 Rails.logger "Async Job"
 end def service_2
 Rails.logger "Service 2"
 end 
01-01-2001 01:01:01 Service 1 01-01-2001 01:01:02 Service 2 01-01-2001 01:01:17 Async Job
 Aggregated Log Latency Simulate
 latency
  9. stella cotton | @practice_cactus 
 
 
 “So, you want

    to trace your distributed system?
 Key design insights from years of practical experience” Raja R. Sambasivan, Rodrigo Fonseca, Ilari Shafer, Gregory R. Ganger http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf
  10. stella cotton | @practice_cactus 
 
 Trace: 
 The story

    of a request’s journey 
 through your system
  11. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing A trace 
 tells 
 this whole
 story
  12. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Trace id 123 Trace id 123 Trace id 123 Trace id 123
  13. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2
  14. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  15. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  16. stella cotton | @practice_cactus 
 Client Start 01:01:01 
 Server

    Receive 01:01:02
 Server Send 01:01:03
 Client Receive 01:01:04
  17. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  18. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Collector Storage Transport
  19. stella cotton | @practice_cactus class RackApp
 def call(environment)
 [
 '200',


    {'Content-Type' => 'text/html'},
 ["Hello world"]
 ]
 end
 end
 Responds 
 to .call() Takes an environment hash Returns:
 [ 
 status, 
 header, 
 body 
 ]
  20. stella cotton | @practice_cactus class TracingRackMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 @app.call(env)
 end
 end Initialize with our rack app Execute our rack app 
 or the next middleware
 in the chain
  21. stella cotton | @practice_cactus class TracingRackMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace do
 @app.call(env)
 end
 end
 end Trace some stuff
  22. stella cotton | @practice_cactus class TracingRackMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace do
 @app.call(env)
 end
 end def trace(env, &block)
 span = Span.new("authentication", generate_span_id)
 span.record(SERVER_RECV)
 status, headers, body = yield
 ensure
 span.record(SERVER_SEND)
 end end https://github.com/openzipkin/zipkin-ruby/blob/master/lib/zipkin-tracer/rack/zipkin-tracer.rb Execute our rack app Received a request Sending back
 to the client Non-pseudocode version:
  23. stella cotton | @practice_cactus # config/initializers/tracing.rb
 Rails.application.config.middleware.use TracingRackMiddleware, {
 service_name:

    "SERVICE_DOMAIN_NAME",
 service_port: 443,
 sample_rate: ENV.fetch("ZIPKIN_SAMPLE_RATE", 0.1).to_f,
 json_api_host: ENV["ZIPKIN_HOST"]
 }
 Sample a portion of requests
  24. stella cotton | @practice_cactus class TracingFaradayMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace!(env) do |env|
 @app.call(env)
 end
 end
 end
 Execute our http client
  25. stella cotton | @practice_cactus class TracingFaradayMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace!(env) do |env|
 @app.call(env)
 end
 end
 
 def trace!(env, &block)
 env = set_headers(env)
 span = Span.new("external_call", 1234)
 span.record(Trace::Annotation::CLIENT_SEND)
 status, headers, body = yield env
 ensure
 span.record(Trace::Annotation::CLIENT_RECV)
 end
 end
 Manipulate the headers Using client instead of server
  26. stella cotton | @practice_cactus class TracingFaradayMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace!(env) do |env|
 @app.call(env)
 end
 end
 
 def trace!(env, &block)
 env = set_headers(env)
 span = Span.new("external_call", 1234)
 span.record(Trace::Annotation::CLIENT_SEND)
 status, headers, body = yield env
 ensure
 span.record(Trace::Annotation::CLIENT_RECV)
 end
 end
 Client Send Client Receive
  27. stella cotton | @practice_cactus def self.client
 Faraday.new(url: base_url) do |connection|


    connection.use TracingFaradayMiddleware
 connection.adapter Faraday.default_adapter 
 end
 end Add our middleware
  28. stella cotton | @practice_cactus End-to-End Tracing: Adoption and Use Cases

    Jonathan Mace, Brown University https://cs.brown.edu/~jcmace/papers/mace2017survey.pdf
  29. stella cotton | @practice_cactus • 15 using Zipkin • 9

    using internal solutions • 1 using other OSS solution • 1 using paid solution
 Jonathan Mace, Brown University https://cs.brown.edu/~jcmace/papers/mace2017survey.pdf
  30. stella cotton | @practice_cactus 
 Dependency matrix of:
 - Tracer


    - Transport Layer
 - Collection Layer
 - Storage Layer
  31. stella cotton | @practice_cactus {
 "buildpacks": [
 {
 "url": “https://github.com/heroku/heroku-buildpack-apt"


    },
 {
 "url": "https://github.com/danp/heroku-buildpack-runit"
 }
 ]
 }
  32. stella cotton | @practice_cactus 
 
 Basic auth via htpsswd

    https://www.nginx.com/resources/admin-guide/restricting-access-auth-basic/
  33. stella cotton | @practice_cactus # config/initializers/zipkin.rb
 Rails.application.config.middleware.use ZipkinTracer::RackHandler, {
 service_name:

    "test.example.com",
 service_port: 443,
 json_api_host: ENV["ZIPKIN_HOST"]
 } ENV["ZIPKIN_HOST"] = "https://username:[email protected]" Uses Basic Auth Where we’re sending traces Our app’s configuration file
  34. stella cotton | @practice_cactus class ::ActiveRecord::ConnectionAdapters::AbstractAdapter
 prepend Tracing::SQL
 end module

    Tracing
 module SQL
 def log(sql, name = "SQL", binds = [], statement_name = nil)
 ZipkinTracer::TraceClient.local_component_span("sql query") do |span|
 span.record_tag("query", sql.to_s)
 super
 end
 end
 end
 end Monkey Patching with Prepend Mimic log method Wrap all sql calls and record the sql statement
  35. stella cotton | @practice_cactus Should you buy, build or adopt?

    What are your infrastructure requirements and limitations? How is it authenticated? Do you have sensitive data? What will you do if it leaks? Is everyone on board? Evaluating Distributed Tracing Solutions:
  36. stella cotton | @practice_cactus Heroku Booth Plant Illustrations designed by

    Natkacheva / Freepik @practice_cactus today, 3:30pm-4:30pm Come say hi at the