Those who work closely with me understand that "Big Data" is not my favorite phrase. The hype is certainly big, even if the specifics are not. Semantics aside, there's a lot to love in the world of data right now. The way that we THINK about using data is definitely changing, or at least being refined. We have to prioritize what data we're going to analyze. (Hint: start with what's available...) HOW we analyze and consume data is really driving the true change.
Big Data initiatives, or even Data Analysis initiatives tend to fall into 2 buckets: real and imaginary.
Imaginary data projects sound really cool. They involve grandiose plans and new technologies. Perhaps you'll even have a team of engineers with all the skills and bandwidth necessary to help you reach all your goals. But they're not realistic. You probably don't have the right data immediately available, the right technology is not available to you, and your developers are spending all their time writing code for existing applications. There is hope, though. Unlike imaginary numbers, imaginary data projects can become reality — just not overnight.
On the flip side, real data projects have some traction. Like their real number counterparts, there are rational and irrational data projects. Just because you CAN launch an initiative to analyze piles of data "with Hadoop," doesn't mean that it's necessarily a good idea today. Embarking on overly-ambitious data projects with unrealistic expectations about time to production, impact to business, and adoption by users is an irrational data project.
A real data project is one that can have an impact right now, with data that you have right now, developed by resources that you have (or can have) right now. (I hadn't intended to channel Van Halen just then — it just happened). The impact to the business may not be as large as you were hoping, but a successful, completed project will always have more impact than a project that never really got off the ground, regardless of how altruistically ambitious. A real data project will also likely be better planned — a better user story, if you will. A real data project may actually fit your idea of a "big data" project, but that's not required.
Want to make your real data project even more...real? Make the information available real-time. The sooner data can be available to users the more valuable it is. In fact, an individual data item likely carries the most business value the moment it is created. Focusing on real-time availability of data brings scalability right behind. Plan to handle piles of data, even if the initial project doesn't call for it.
Here's a great video by Scott Jarr from VoltDB, Breaking Down the Database Universe, my source for the concept that a single data items greatest value is when it is first created.