Hacker News new | past | comments | ask | show | jobs | submit login
Learning from a Year of Security Breaches (medium.com/starting-up-security)
327 points by arice on Dec 20, 2016 | hide | past | favorite | 49 comments



Hi, I wrote this!

To continue a discussion:

  - How does your engineering team track new "debt" after releasing code? (if at all, and why not)
  - Do you pay anyone for centralized logging, or wish you didn't? Are you making it useful?
  - Do you feel like your company is good at managing access when hiring / firing people?
Otherwise thanks for any feedback, I enjoy writing these!


Can only speak about my corner of a very large organisation;

- Technical debt of custom coded solutions is a known issue across our organisation. New strategy is to move to market solutions, therefore outsourcing the risk to organisations with (hopefully) better code management than we have. For my corner, we don't have technical debt measured accurately enough for my liking.

- Yes, we pay for an use centralised logging. We've actually been through two solutions, and are now moving to a third due to various factors (cost, integrations, speed, out-of-the-box metrics). Integration into the centralised logging system is part of our Request for Tender marking criteria.

- Relatively good at disabling access after someone leaves. We integrate as much as possible to a central repository. It's just the outliers that tend to last beyond someone in the organisation. Critical systems are absolutely shutdown within 24 hours of a leaver departing (usually immediately if they're a bad leaver).

Edit: Formatting


> (hopefully)

I hope you are auditing the code of those external orgs.


When you use SaaS products, auditing the code is not a service they offer. You have to rely on certifications from independent certifying organisations, etc.


Part of the goal is to establish an arms length relationship to lower legal liability.


Which logging systems did you like/not like?


Alientvault: Ok...we probably didn't get full potential here HP ArcSight: Extremely powerful, especially the normalizing of logs across similar system. Requires a team to manage though. Splunk: Our business isn't ready for cloud based hosting of centralised logs. Otherwise, we'd be on this already. From my perspective, purely from a reduction in complexity to pull useful information (not just Security).


Thanks for writing this, really insightful! A question: What's your advice on how to store secrets on the server-side?

Currently, I mainly use a seperate "secrets.yml" file that gets deployed via Ansible and is stored there encrypted using Ansible-Vault with a strong password. Is that a reasonable approach? What is your opinion about storing secrets in environment variables? It seems that some people advise this over storing them in files, but I have seen some cases where environment variables can be exposed to the web client as well.


I don't like the idea of keeping secrets in ENV and limiting it to config, though it's the kind of thing I'd ask other folks about myself to understand any tradeoffs. I see Kubernetes and other things supporting secrets in env variables so unsure how common it is.

The big win is simply keeping secrets out of source code, out of an general engineer's copy/paste buffer, and with errors not going to a logging platform with single factor access. Your likelihood of a short term incident decreases dramatically. Especially if those secrets have well segmented access, (IE, not a single AWS key with `AdministratorAccess` everywhere).


If your code adopts a convention of reading secrets from the environment, you get a lot of flexibility in how they're actually stored; you can put them in protected files and export the contents of the file before running the service, or you can have a tool that works like "env" that populates from a secret store. Your secret storage system can get more sophisticated without your code having to change.

I wouldn't recommend putting them in /etc/environment or /etc/profile or /home/service/.profile where you'll forget about them, though.

Just as a strategy for passing secrets to code, I like the environment a lot.


Article and discussion re storing secrets in environment variables: https://news.ycombinator.com/item?id=8826024

The gist seems to be that it's easy to accidentally leak environment variables (which is why I think the top comment is off-base). tptacek, do you think this risk is overblown?


It's good to be aware of the fact that environments are inherited by child processes (as are file descriptors), but I don't think that's a good reason to avoid using the environment.


Have you heard of torus.sh keyrings? I don't know how well it works for an organization, but integrating torus into my side projects has been painless.


Great article, shared it on with my coworkers.

- poorly, really.

- for network, and security stuff, absolutely: splunk is the bees knees. For apps, each team tends to run their own mix (graylog2/elk/custom). Have pushed for more security type events from apps into splunk for correlation, but it just costs too damm much.

- depends on the region. I find US / UK do okay, but the more emerging/growth markets where we have employees, the worse it gets.


You said this: "Rarely do I see a team eliminate all of their debt, but the organizations _that least respect_ their debt never get so far behind that they can no longer be helped in a breach."

Do you mean instead "that _at_ least respect"?

I ask only because they two have different meanings.


Yep, fixing


Thank you for writing these. These blog posts are my go-to resources when my client companies want to learn more about what they can do to improve their security posture long term. It's a really great series.


Where do I start on centralized logging? I'm primarily an application developer, deployment isn't my strong suit. My hair is on fire at my current startup. There's a ton to do, we're trying to launch several new major efforts in January. What's a good plug and play solution that I don't have to think about?

Are there hosted installs of Elasticsearch/Logstash/Kibana? Is ELK even what I want?

Every time I start looking at centralized logging stuff it seems like a rabbit hole of problems we're too small to be worrying about, stuff that's not shipping features on my app.


You have a lot of decent options. You could do a lot worse than ELK. If you're on AWS, you can get hosted Elasticsearch. It comes out of the box with Logstash you can hook up to DynamoDB, and it also does Kibana out of the box. There are a number of other vendors; but there are decent reasons for keeping your logs as close as possible.

CloudWatch works fine too. CloudWatch comes integrated with AWS services out of the box. It can be more annoying to get your logs into it than ELK (the latter seems overall more popular). Its alerting and the AWS CLI integration pretty slick, though.

You should also go turn on CloudTrail right now. It lets you automatically log side-effectful API calls. It is not a replacement for a centralized logging pipeline, but it's great high-signal data to put into one.

I appreciate that your complaint (totally valid!) was "this is a rabbit hole", and I just gave you two options, and that might not help your perception that it's a rabbit hole. If you find yourself paralyzed by choice, either choice is much better than deferring the choice! Just pick one. Heck, if you can't pick, let me help: pick AWS hosted Elasticsearch.

A lot of people (also in the security space) like Splunk. I find it annoying to deploy (I've heard rsyslog-in-front-of-forwarders as a canonical deployment method for just ingesting syslog more than once because reasons) and overpriced. YMMV.

Disclaimer: shameless plug! You're not the only one with your hair on fire. One of the first things we're doing for Latacora customers is setting up a centralized logging pipeline.


I second ELK, and I even stronger-second Splunk being overpriced (with the caveat that if you do deploy it, I think it's the best option, just not really worth it).

I think it's really important to internalize the idea that there is no Platonic ideal of a logging solution. It's a fundamentally frustrating manifestation of entropy that you're going to wrestle with, but it's a really necessary goal to work towards long term. Sort of a "the first step is admitting powerlessness" kind of deal.


I've had good luck with Cloudwatch and, if you're on AWS, I'd recommend it over any other hosted log system (with the possible suggestion of a more elaborate ELK setup that you build yourself).

The trick to Cloudwatch is --- like most AWS services --- never using the web UI.


That's a good point! If you have someone consuming it that wants a (shared) web UI, you want Kibana. If they prefer to consume their text in a terminal and are fine with typing `aws logs` a bunch, CloudWatch is fine (and probably a little less twiddly than ELK).


These are such great comments, thanks for sharing your insights. For folks looking for other options, I'd also mention https://honeycomb.io, perhaps the most promising newcomer in this space. It's essentially Facebook's Scuba for the rest of us.


It depends a LOT on how many machines and services on the machines you're dealing with. There's a remarkable amount of stuff at the small end which is good, cheap, and fast to deploy.

I've been using Loggly for my personal machines (~8, mostly cloud VPSes). On the plus side, it's free at my scale, and the analysis and reporting tools are nice at least in theory. On the minus side, I can't get my logs past 7 days archived to S3 without paying $150/month, which I really want since my main use-case is longer-term analysis and forensics.

I'm planning to switch to Papertrail, which for the princely sum of $7/mo will give me a simpler UI and a year's archiving to S3.

Loggly and Papertrail both use the same deployment strategy (you hook them up to syslog and/or your app's logging package), and I had Loggly up and running and providing useful feedback in solidly under four hours.


For small to medium log volume, I can only recommend Loggly.

The killer feature it has is for me is searching structured (JSON) logs. Just use the Logstash/Greylog library in the language of your choice and send the logs to Loggly, and you quickly have a logging system where you can zoom in on the logs comming from different subsystems of your codebase or produced by a specific user.


I would use some SaaS solution. ELK can be cheaper, but can take tons of time to configure and maintain if you run it. For early startup paying under $100 / month for some logging solution is no brainer vs. spending time configuring.

Disclaimer: I work at Sumo Logic I would recommend: https://www.sumologic.com On top of grep like searches, you can do analytical searches (SQL on text data).


Here are some SaaS choices:

* Sentry: https://sentry.io/welcome/

* Logentries: https://logentries.com/

* Loggly: https://www.loggly.com/

* Opbeat: https://opbeat.com/

* Papertrail: https://papertrailapp.com/

Sentry is open source and there is even an official up-to-date docker image: https://hub.docker.com/_/sentry/

Loggly published an "Ultimate Guide to Logging": https://www.loggly.com/ultimate-guide/


Please excuse the shameless plug but since you are asking for one, Striim is a good out of the box centralized logging solution. We use Kafka as our messaging layer (you can either install on your own Kafka or use our internal) and we use Elasticsearch as the storage layer.

We also have streaming log parsers to connect your data. That whole thing about 'creating new alerts in minutes' is trivial in our platform since everything is based in SQL.

Unlike Splunk or ELK, our solution is based on in-memory streams so you don't have to wait for data to be indexed to fire off alerts on anomalous activity. Feel free to message me to find out more or simply download the product from http://www.striim.com/


TFA aside, centralized logging is super useful for debugging a variety of issues. There are a number of hosted options, and setting them up isn't too hard. It usually involves configuring you're application's log device to talk to the remote service, or configuring syslog on your app servers to forward logs to said service.

See https://logentries.com/ for an example


If you want a pretty prepackaged solution you could do a lot worse than splunk. They even offer it SaaS

https://www.splunk.com/en_us/cloud.html


I'm in the same boat. Looking for recommendations on strong, sturdy, buckets, for bailing water.


This is the best security article I've read in a long time. If you're at a startup right now, drop most things and take a few minutes to read it carefully.


Ugh. A good and scary reminder of what's lurking around the corner for any of us at any time - including holidays and vacations (Linode's holiday attack last December comes to mind). IMHO, the emotional impact of breaches on the staff who respond to them is under-discussed. The author touches on it here:

> The discovery of a root cause is an important milestone that dictates the emotional environment an incident will take place in, and whether it becomes unhealthy or not.

> A grey cloud will hover over a team until a guiding root cause is discovered. This can make people bad to one another. I work very hard to avoid this toxicity with teams. I remember close calls when massive blame, panic, and resignations felt like they were just one tough conversation away.


Had the pleasure of working with Ryan when he was at FB--he's one of the best.


You were part 'Red Team' incidents[1]? I can only imagine the panicked sysadmins running around like crazy, but jokes apart his is the best way to train a team's incidence response I've seen.

[1] https://medium.com/starting-up-security/red-teams-6faa8d95f6...


It's interesting to see press leaks highlighted here as a pattern for insider threat. I don't doubt the author that this is so for the limited scope of organizations considered (SFBA tech companies), but I've worked on several insider cases, had insight into many more, and it's almost always an employee or ex-employee, with an axe to grind, taking trade secret information to a new job at a competitor. In many instances, the competitor has no idea and is pissed when they find out.

One piece of advice that I'd give out with such cases is to listen to your Spidey Sense. A lot of organizations will say, after the fact, "well... something didn't seem right with Bob...". If you sense something isn't right, prepare to secure evidence and analyze it. Don't put IT assets back into circulation if there's doubt, and don't sit on it.


This is solid advice. To illustrate a little based on my own experiences and goals this year:

- Yes, centralized logging is the biggest thing. What you put into it matters; queriability matters; but nothing matters as much as having that centralized logging pipeline to begin with. Once you have that, you can start adding other relevant metadata, like host config states, API calls, et cetera.

- Giving employees a budget to buy the device they want is probably a better idea than BYOD. Strong password policies still matter. If it's BYOD, you probably still want to bring the device into policy. That can include physical rules (only do work work on the VPN or from the office) and software ones (you can use any device you want but it has to be running our osqueryd or whatever). Unfortunately, visibility becomes a double edged sword: there are good legal and ethical reasons for not wanting to see everything on an employee's laptop. (Overall, I think BYOD is a bad idea for most companies.)

- 2FA is pretty cool. It doesn't just solve the usual "bad/compromised password" model -- it also typically makes it a lot harder for employees to mismanage their credentials (e.g. re-use the same SSH keys and have their personal box be compromised). For some reason, having that around seems to remind developers that you can make users re-authenticate for important/unusual actions -- you don't just have to count on the ambient authority of a session cookie.

- We'd all like to imagine that we're going to be attacked by space alien 0day ninjas. Realistically, the main vector is an employee (rogue or confused deputy). Trainings are boring and don't work. Signature-based detection gets outdated pretty quick. I've done a little work on faster analysis tools -- I'm hoping we get a lot better at unobtrusively protecting people from even spearphishing in the next few years. (The tools we're building at Latacora are ready to beat a lot of attacker tactics right now, but I think we have an arms race ahead of us. Boring domain generation algorithms still aren't detected by most organization, so there's not a lot of evolutionary pressure.)

- I have no idea if we'll get better at quantifying metrics for debt and security risk. I did a little bit of research into this, and it's a wide open field. You can get decent high-level reports with a "DEFCON number", but most of these models are not sophisticated in the sense you'd expect actuarial tables to be. And that's what they should be! It's revenue-at-risk! Step one here is fortunately getting all of that data into that centralized logging pipeline, and security professionals seem to mostly agree that's what you do first, so hopefully we get better here.


Logging everything is a great idea, but only if you read the log data. Target installed a system to monitor for certain kinds of security hacks which wrote to their logs files. The logging was turned off due to a high number of warnings cluttering up the logs. Of course the logging was telling them they were being hacked which they ignored for months, leading to all sorts of business disasters.


Great article! For those interested in security debt and how it relates to startups, I wrote this in 2011: https://www.veracode.com/blog/2011/02/application-security-d... and presented it that year: https://www.youtube.com/watch?v=MKdiiXgvz_U This predates by a year the referenced security debt presentation which has much of the same material uncited.


> I wasn’t roped into a single intrusion this year at any companies with completely role driven environments where secrets were completely managed by a secret store.

> This can either mean one of a few things: These environments don’t exist at all, there aren’t many of them, or they don’t see incidents that would warrant involving IR folks like myself.

What are these secrets store? Do they exist?


In general, secret stores "manage secrets so that you don't have to". That can mean a few things, depending on who's using the term.

Sometimes, it's as simple as a shared password store (I've used one powered by GPG, for example). This is better than YOLO password policy, but not by much: humans still see individual keys.

If you want to be really fancy, you authenticate the human and then decide what they get to do, in a centralized fashion. This is often tricky to do, because you either don't have the funds to do that if you're small, or you have too many services to interact with if you're big. (Many organizations get pretty close -- I'm told that the DoD pretty much authenticates everything with smart cards, for example.)

Sometimes, it means a more automated system where software authenticates instead of a human, and it gets e.g. a certificate. Usually this is still always the same certificate, though; so the main difference is that it's a human versus a machine authenticating.

Sometimes, it means an HSM (hardware security module). These are secure physical devices that perform cryptographic operations for you, so that the key stays on the device.


Still need a secret to access the secret store... so steal the secret then steal the secrets in the store.

I fail to see how it is secured. (Though, I can understand that it is less bad than a YOLO policy).

> Many organizations get pretty close -- I'm told that the DoD pretty much authenticates everything with smart cards, for example.

I've been at a place with RSA SecurID (smart card and OTP) + active directory account as SSO authentication for everything (use one or both for 2FA). It was nice and well done.


You made that point elsewhere in the comments; and I replied to it there; for the benefit of other people wondering why a secret store _isn't_ just robbing Peter to pay Paul: https://news.ycombinator.com/item?id=13224802


For example, Hashicorp Vault[0]

[0] - https://www.vaultproject.io/intro/index.html


Then people need secrets to access the secret store and you're back at square one ;)


There are plenty of things a secret store still buys you.

- It knows how to encrypt and store secrets securely. Having one specialized application have an opinion on how to do that is much better than having a hundred ones that do it incidentally. The central one will be audited and monitored. The hundreds will invariably mess it up.

- It tracks who accessed a secret and when. This is critical information for remediation and ongoing scope reduction. Knowing who accessed what, when gives you the context for why; all three tell you how to further reduce the authority that application has.

- It can generate "minimal" credentials on-demand. I.e. a new key that only lets you access what you need and for a limited amount of time.

- It can encrypt things on behalf of the requester, such that the requester never sees the key. That is good, because it can be one-way. It is also good because if a service is compromised, the compromise may be detected and remediated (access revoked) before all data is dumped and compromised. Having the secret store lets you do e.g. rate limiting and centralized monitoring, for example.

- Secret stores can know how secrets are linked; making it easier to do revocation, and easier to determine the impact of a breach or misuse incident.


There are a number of secret stores. Some more basic ones resemble password managers on steroids, with audit logs of who checked out what and when. Or you can go to a full HSM (hardware security module) that totally isolates secrets (keys) from secret users (actual users, application code etc). HSMs allow you to sign or encrypt without ever having the keys used. It's hard to accidentally leak a secret if you never had it in the first place.


Im surprised credential theft is still the lowest hanging fruit.

I thought Banks seem to have solved alot of that.


Banks are in the business of managing financial risk for their customers, and they have enough money to eat a lot of risk before it becomes a problem for them. Other business models with less money in them do not have the same kind of resources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: