Shutterstock has a nearly insatiable appetite for data storage. From its inception, the company — a global provider of licensed photographs, vectors, illustrations, and videos — refused to pay higher prices just to stuff its storage needs into somebody else’s cloud. Instead, the almost 10-year-old image-storing warehousing operation built its own server farm and created its own cloud software system at home.
Shutterstock’s storage appetite continued to grow. Towards the end of 2012, it stored some 20 million images. Since then, it has added an average of 10,000 images per day. The cost of operating its own cloud storage facility kept the company’s operations budget on a diet, however, thanks to open source technology.
For instance, Shutterstock relies heavily on low-cost chips and other hardware focused on open-source software standards to achieve lower costs. This approach lets Shutterstock pursue flexible, automated image handling and allows it to scale storage needs much more efficiently and effectively. Add to that strategy the company’s use of the open-source database MySQL for a near-perfect home cloud brew.
As a result, the company has saved 90 percent on its technology budget by being able to use similar storage building blocks from other vendors, Chris Fischer, Shutterstock’s vice president of technical operations, told LinuxInsider. This allows the company to use the open-source tools that Google and Amazon rely upon but toss in more support for a smaller organization.
In this interview, Chris Fischer tells LinuxInsider about the trade-off any enterprise must make in choosing open source over a commercial or proprietary cloud platform.
LinuxInsider: Why choose an open-source platform over a commercial or proprietary option when starting out?
Chris Fischer: Simple. Many of the tools were open-sourced already and obviously inexpensive. The founders relied on the community to provide support and guidance. Given the prominence of open-source modules, you could learn a lot about building almost any type of system you could dream up.
We were drawn to open source for its flexibility and cost efficiency. We dabbled with a few proprietary items but found it unsuccessful and now have no plans to deviate from open source. I see open source as creating the largest capabilities for the company to continue to scale the system. It will also open the company to the largest talent pools of developers.
How do the open-source products you use compare technologically to the proprietary software you passed up?
Fischer: There is a huge tech edge with open source. When you are working with paid software, a lot of the time, you get usage problems that can not be solved for a particular user base. With open source, the huge community base eliminates that hurdle.
Some of the most mature databases have been open-source-based. Also, the most mature Web servers in the market are open-source software. Considering the level of maturity and the capabilities of the technology, I would take open source over any proprietary software. There are numerous third-party vendors to provide support if you want to pay for that service.
What are the key factors to consider in deciding on open source over proprietary platforms?
Fischer: The first thing that comes to mind is the flexibility. I have had the nice pleasure of working at businesses that are always growing, which means constantly making changes to the applications. When you are working in that type of environment, flexibility is the No. 1 thing of importance. A lot of closed-source technologies have a variety of platforms and features that they offer, but I have never found the technologies nor seen the flexibility with proprietary products that I see in open source. This is especially true when you are trying to solve something new or you are experiencing a unique problem.
What were the tripping points in getting established with the platform you use now?
Fischer: I have never found myself sitting around wishing that I had a proprietary or an enterprise product instead of what I was using. When we hit a problem that was difficult, it was comforting to know that if push came to shove, we could always dive into the code and find a solution.
Understand that we were not always using brand-new stuff all the time. When we use open source, it is the robust, proven, stable projects that have been developed through the years.
How extensive is your investment into open-source methods in terms of financial outlay and maintenance?
Fischer: We manage all of our own cloud storage right onsite at Shutterstock. It is all our gear. We manage the systems and run the software. Our storage stack, from a software perspective, is through and through open source. It is a true cloud storage system, even though we hold all the pieces locally. It feeds through http, and we use all the puts and get commands.
I’m not accessing a file. I’m accessing the storage system using the Internet. It just so happens that my Internet is right there. I could give you 20 examples of how we used a Linux project, changed it, and sent it upstream.
Why run your own cloud — which by itself is akin to operating a separate business — when you could subscribe to an open-source cloud service?
Fischer: We are a cloud-based company. We manage our own cloud. We have the same type of APIs, orchestration, and utilities to enable developers to interact with these systems in a cloud-like fashion. It just so happens that we also know what the hardware is. We rack it. We stack it. We turn it on. When it breaks, we just shut it off, just like you would if you were a cloud provider.
We do all of that because it is potentially less expensive, and we get a higher performance, which is what our application really requires. Those are the two driving factors that led us to manage our own cloud.
Having built your own playbox, so to speak, what does it include?
Fischer: I would describe our hosting environment as a Shutterstock-operated and -designed cloud, built using open-source software and commodity hardware. We run our own data centers, soup to nuts, and we designed our cloud in-house.
So, using your own design might well serve as a how-to guideline for others. Can you expand on the details?
Fischer: Hardware databases are orchestrated using Puppet, MCollective, and an in-house tool named Optopus (we open-sourced the code). We can provision nodes using some custom code and Foreman. Hypervisors for virtualization on kvm/libvirt orchestrated and managed by Puppet, MCollective, and Optopus; asset storage using a combination of white-box servers plus a lot of Coraid disks at a hardware level with an http-based open source storage system MogileFS doing all the heavy lifting. This works a lot like S3 — http puts and gets, but we have a dramatically lower cost.
How does this cloud system you use measure up against security concerns that you would face if you farmed out your storage to another cloud provider?
Fischer: I have to take security just as seriously as if I were hosting all of this on Amazon, and we still do some work with Amazon as well. Ultimately, Amazon may be a bigger target for hackers, but we are a target for hackers, too. Just because we manage our cloud internally does not really make me feel that I am managing the security in a better way. In fact, we have a pretty long track record of using lots of cloud systems. All of our email suite is on Google Docs. We use Box for all of our internal IT storage, and that is a cloud-based service.
So I do not think we view it as being a security advantage to host it ourselves. Maybe it is easier to deal with compliance because you can see directly what is going on, so that is a little more convenient, but do I think it is more secure because it is behind my firewall? No! All of the software you would need to keep things securely in a different cloud also exists here.
Given your long history with open source, do you see that technology as being a Nirvana today, or do you see holes in that foundation?
Fischer: For us, when you use open source, you are investing in the technology. You need to be ready to give something back or at least be willing to gain more expertise in using it. So, from a strategic perspective, I like to believe that we actually know how our storage engine works or how our services work because we are actually modifying that code and being very intimate with it. That is getting us a very high caliber of engineers who help us solve some really amazing problems.
So the thing that you need in open source is to choose to invest time in managing the open source products. If you use proprietary software, you choose to trust that to somebody else so you can invest your engineering time on something else. It is a trade-off.