By Camuel Gilyadov, on October 17th, 2010

Two Envelopes Problem: Am I just dumb?

It seems the recent craze about statistician being a profession of choice in the future gains steam. In future where we will be surrounded by quality BigData, capable computers and bug-free open source software including OpenDremel. Well the last one I made up… but the rest seems to be the current situation. Acknowledging this . . . → Read More: Two Envelopes Problem: Am I just dumb?

By Camuel Gilyadov, on October 13th, 2010

Debunking common misconceptions in SSD, particularly for analytics

1. SSD is NOT synonymous for flash memory.

First of all let’s settle on terms. SSD is best described as a concept of using semiconductor memory as disk. There is two common cases: DRAM-as-disk and flash-as-disk. And flash-memory is a semiconductor technology pretty similar to DRAM, just with slightly different set of trade-offs made.

. . . → Read More: Debunking common misconceptions in SSD, particularly for analytics

By Camuel Gilyadov, on October 12th, 2010

Google Percolator: MapReduce Demise?

Here is my early thoughts after quickly looking into  Google Percolator and skimming the paper .

Major take-away: massive transactional mutating of tens-petabyte-scale dataset on thousands-node cluster is possible!

MapReduce is still useful for distributed sorts of big-data and few other things, nevertheless it’s “karma” has suffered a blow. Beforehand you could end any MapReduce dispute by . . . → Read More: Google Percolator: MapReduce Demise?

By Camuel Gilyadov, on October 11th, 2010

How scalable is linux kernel on 48-core machine?

According to this excellent and comprehensive research with some kernel hacking ~x33 speedup (compared to single core) is possible. For example PostgreSQL running on 48 cores gives ~x4  out of the box and after kernel/postgreSQL patches are applied it grows to ~x33. Assuming IO can keep up of course.

By Camuel Gilyadov, on October 11th, 2010

Is NoSQL a DBMS?

Yes, it is.

Proof? – By definition.

But Wikipedia…… – fixed.

By Camuel Gilyadov, on October 8th, 2010

CAP equivalent for analytics?

CAP theorem deals with trade-off in transactional system. It doesn’t need an introduction, unless of course you have been busy on the moon for last couple of years. In this case you can easily Google for good intros. Here is a wikipedia entry on the subject.

I was thinking how would I build an . . . → Read More: CAP equivalent for analytics?

By Camuel Gilyadov, on October 8th, 2010

Analytics Patterns

Unsatisfied by my previous post‘s Advanced Analytics definition and giving it a thought of what is advanced methods in analytics I realized that analytics industry miss a good analytics pattern catalog. A list of common problems followed by a list of common industry-consensus solutions to them. An equivalent of GoF design patterns to analytics. The . . . → Read More: Analytics Patterns

By Camuel Gilyadov, on October 8th, 2010

Feature list of ultimate BigData analytics

Volume Scalability => the solution must handle high volumes of data, meaning the cost must scale linearly in the range of 10GB – 10PB. Latency Scalability => the solution must be interactive or batch, and cost must scale linearly in the range of 1 msec – 1 week. Sophistication Scalability => the solution . . . → Read More: Feature list of ultimate BigData analytics

By Camuel Gilyadov, on October 7th, 2010

Terminology: Analysis vs. analytics advanced analytics

I see a lot of confusion in the usage of newer terms in analytics. I do confuse them myself occasionally. I find it funny that the industry as serious as analytics tolerates constant renewal of its basic terminology. Yet, I confess, I’m very guilty of it myself. I do enjoy the freshness and the novelty . . . → Read More: Terminology: Analysis vs. analytics and more…

By Camuel Gilyadov, on October 1st, 2010

The story behind this blog

Continue reading The story behind this blog