Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Grails at SmartThings

Scaling Grails at SmartThings

Like most startups using Grails it didn't take long before SmartThings had built a monolithic Grails application. This talk will go over a few scaling issues we've ran into along the way and how we've overcame them and continue to use Grails as a core technology in our cloud platform.

Ryan Applegate

July 29, 2016
Tweet

More Decks by Ryan Applegate

Other Decks in Technology

Transcript

  1. Who am I •  Ryan Applegate •  Lead Software Architect

    @ SmartThings •  @rappleg on Twitter and GitHub
  2. Agenda What is SmartThings? Building/Deploying a Grails monolith Databases Caches

    JVM Tuning with Groovy Rate Limiting When you outgrow your plugins Where do we go from here?
  3. Building a monolith Core cloud platform (Deployed to AWS) Grails

    was a great fit for startup needs •  APIs for mobile clients •  Rabbit for queue processing •  MySql DB (RDS) Codebase grew fast ~ 175k LOC
  4. Deploying a monolith Same Grails codebase deployed with different configurations

    as separate clusters •  API (mobile clients, etc…) •  Devices (messages from devices) •  SmartApps (device subscriptions) •  Scheduler (execute at a certain time) •  System Jobs, etc… Clusters are for isolated workloads, predictability, and scalability
  5. Canary Deployments Deploy a single instance with new code Can

    be to any set of clusters or shards Zero-Downtime deployments Monitoring metrics on the canary to determine if the deploy should be rolled back or forward before shutting down old servers •  CPU •  DB connections •  Error rates •  Latency
  6. Monitoring Tools DataDog (Dropwizard metrics, etc…) SumoLogic (Log aggregation, dashboards)

    MonYOG (RDS monitoring) AppDynamics (Application tracing) OpsCenter (Cassandra) PagerDuty (Alerting) AWS console (CloudWatch, etc…)
  7. Many to Many Gotcha static belongsTo = Capability static hasMany

    = [ capabilities: Capability ] static hasMany = [ deviceTypes: DeviceType ] Capability DeviceType How expensive is deviceType.addToCapabilities(…)?
  8. Manage many to many yourself static transients = ['capabilities'] Set<Capability>

    getCapabilities() { CapabilityDeviceType.findAllByDeviceTypeId(this.id).collect { it.capability } as Set } static transients = ['deviceTypes'] Set<DeviceType> getDeviceTypes() { CapabilityDeviceType.findAllByCapabilityId(this.id).collect { it.deviceType } as Set } Capability DeviceType
  9. Implementing mapping table class CapabilityDeviceType implements Serializable { DeviceType deviceType

    Capability capability static CapabilityDeviceType create(DeviceType dt, Capability c) { new CapabilityDeviceType(deviceType: dt, capability: c) } … } CapabilityDeviceType.create(deviceType, capability)
  10. Transactional Overhead •  Persistent store to MySql DB (max ~5600

    connections per instance) •  Need to be mindful of DB connections and overhead caused by unnecessary transactions •  @Transactional causes check to tx_isolation to start •  Commit at the end to persist changes to the DB •  JDBC pool exhaustion is very expensive
  11. Default Grails transactional behavior class FooService { String getFoo() {

    return “bar” } } Is getFoo() transactional?
  12. Turning off transactions if not needed class FooService { static

    transactional = false String getFoo() { return “bar” } }
  13. •  Persistent store to MySql DB (max ~5600 connections per

    instance) •  Need to be mindful of DB connections and overhead caused by unnecessary transactions •  @Transactional causes check to tx_isolation to start •  Commit at the end to persist changes to the DB •  Explain replicas and how to leverage replicas in JDBC connectstring, why use them? •  JDBC Connection Exhaustion •  Async + fanout, have queue provide backpressure
  14. Explicitly setting transactional = false import org.springframework.transaction.annototation.Transactional class FooService {

    static transactional = false @Transactional String getFoo() { return “foo” } String getBar() { return “bar” } }
  15. Transactional puzzler #1 import org.springframework.transaction.annototation.Transactional class FooService { static transactional

    = false String getFoo() { return getBar() } @Transactional String getBar() { return “bar” } } Is getBar() transactional when called from getFoo()?
  16. Don’t use springframework import grails.transaction.Transactional class FooService { static transactional

    = false String getFoo() { return getBar() } @Transactional String getBar() { return “bar” } } Now getBar() will always be Transactional
  17. readOnly configuration import grails.transaction.Transactional class FooService { static transactional =

    false Transactional(readOnly = true) String getFoo() { return getBar() } }
  18. Transactional Puzzler #2 import grails.transaction.Transactional class FooService { static transactional

    = false @Transactional String getFoo() { return getBar() } @Transactional(readOnly = true) String getBar() { return “bar” } } Is getBar() readOnly when called from getFoo()?
  19. Propagation import grails.transaction.Transactional class FooService { static transactional = false

    @Transactional String getFoo() { return getBar() } @Transactional(readOnly = true, propagation = Propagation.REQUIRES_NEW) String getBar() { return “bar” } } Now getBar() will always be readOnly
  20. Metrics Dropwizard metrics for meter, timer, histogram Tuning for the

    99% Primarily use 1 minute rate, mean, and 99%
  21. Leveraging caches When to start adding caching? Cache invalidation is

    hard to do well so be careful about pre optimizing So you actually need to cache? Client side vs Server side (mobile clients) Distributed vs In-Memory caches (far vs near) Near cache miss > Far cache miss -> RDS
  22. Distributed caches (far caches) Running in AWS ElastiCache •  Redis

    •  Memcached Which one to choose after using both? We actually still run both as they both fit a need.
  23. In Memory caches (near caches) Near cache as in-memory on

    the same box as the client •  Guava Cache (LoadingCache) •  ConcurrentHashMap
  24. JVM Tuning with Groovy Groovy may define classes at runtime

    Every time you run a script, 1 (or more) new classes are created and they stay in PermGen forever -XX:+CMSClassUnloadingEnabled Allows GC to sweep PermGen too and remove classes no longer being used Needed for Java 7, not needed in Java 8
  25. Be aggressive with soft references -XX:SoftRefLRUPolicyMSPerMB=125 Default value is 1000,

    or one second per MB Lower number is cleared more aggressively
  26. Explicit heap sizing -Xms4G (Max heap size) -Xmx4G (Min heap

    size) -XX:MaxPermSize=2G (<= Java 7) -XX:PermSize=2G (<= Java 7) -Xmn1G (New gen size) -XX:SurvivorRatio=8
  27. Rate Limiting Effectively shed load to relieve backpressure •  Device

    execution •  SmartApp execution •  User API execution •  Etc…
  28. When you outgrow your plugins The code you writing at

    the beginning of a project won’t scale forever, so don’t expect your plugins to Quartz For system jobs or crons that run a few times a day Not running millions of schedules a day
  29. Where do we go from here? Microservices (business scalability) Move

    more high churn MySql tables to C* or Aurora Auto-Scaling based on various platform metrics Automated blue/green deploys More GC and performance tuning