Businesses and government agencies are in a race to gather, quantify and clarify an ever-increasing stream of data. Housing the bits and pieces of their digital treasures can be just as much of a problem as deciding whether to trust traditional relational platforms or adopt more flexible databases designed to handle unstructured data.
Solution providers have a dizzying collection of database products and Big Data storage offerings that add more complexity to the process; and if all of these options and concerns are not enough, be sure to add into the mix the contest between proprietary and open source products.
Silicon Valley startup Treasure Data built its open source Big Data analytics platform around a subscription-based model. In this interview, LinuxInsider talks with Hironobu Yoshikawa, Treasure Data’s cofounder and CEO, about the market factors driving a growing adoption of both open source technologies and data warehousing in the cloud.
LinuxInsider: The NoSQL, Hadoop and traditional database communities seem to focus on Big Data in different ways. Why are these groups so far apart?
Hiro Yoshikawa: Big Data is collecting your data into one place and making sense out of it. These groups are far apart in terms of their technology, ranging from fully proprietary to 100 percent open source, but they all face the same questions. One, how do we collect data from a variety of data sources in a manageable way, and two, what value can we derive from the collected data?
LI: The inexorable march toward cloud storage seems to have run into a Big Data detour. With all the talk about cloud storage and Big Data, it is hard to tell which comes first. Does a cloud foundation enable Big Data applications, or did Big Data create a need for cloud storage?
Yoshikawa: Cloud storage offers many advantages over on-premises storage, one of which is critically important for Big Data: less resource provisioning overhead. Essentially, cloud storage is enabling small companies and individual lines of business at big companies to kick off their Big Data projects without asking for a big infrastructure investment upfront.
The idea of cloud storage definitely came before the current hype around Big Data, for sure, but we think the demand for scalable (both scaling out and down) Big Data solutions will catapult the demand for better, cheaper, smarter cloud storage.
LI: How is Big Data being used, and what issues must be addressed as a result?
Yoshikawa: The early adopters of Big Data technology are in high tech, especially Web-based software companies, and especially in marketing — broadly speaking. Just by looking at our early customers, this is evident. They are in social gaming/Web startups/advertisement technology, and they are using our service to understand their customers better and boost revenue/profit.
LI: How do you think such usage will change over the next few years?
Yoshikawa: Most data out there is not collected, or if it is, not in any shape or form amenable for intelligent analysis. As people realize how much value they can get from the data they already have, this will change, especially in non-Web sectors like manufacturing, utilities, etc.
LI: What big misconceptions do you see enterprises making about Big Data?
Yoshikawa: There are four V’s of Big Data that are often talked about: velocity, volume, variety and value. There is a fifth V that is often missing in many Big Data projects: viability. A lot of Big Data projects fail because they require too much upfront investment (both in terms of hiring experts, hardware, learning curve) and maintenance (hardware obsolescence/technology debt/hiring more experts). While hardware and Big Data-related technologies might be commoditized, the deployment of a successful Big Data solution will not be anytime soon.
LI: Cloud storage and cloud applications have raised concerns and, for many enterprises, the concern is so high they are putting cloud initiatives on hold for the time being. Are there new security challenges with Big Data?
Yoshikawa: Not necessarily with Big Data per se, but as with anything on the cloud — whether it is Big Data or SaaS or e-commerce — both security and compliance will always be something to think about. That said, we noticed that there is a lot of data out there that is either publicly accessible or poses no security risk that is yet to be analyzed effectively, and we are seeing our customers putting those types of data into our system to make better use of them. In the grand scheme of things, there is so much data that is totally fine to be put on the cloud. That’s the place for everyone to start.
LI: What convinced you to not pursue a proprietary product?
Yoshikawa: The founders love open source and the community involvement. We firmly believe in the power of open source, both in terms of the community and the quality of the software. We believe that enterprise users should be able to access fully and control their data whether it is in a hosted platform in the cloud or part of their own on-premises databases.
The Treasure Data solution solves one of the main issues – how to deal with massive, and heterogeneous data types in both the uploading computational processing required to turn disparate data into real information…