Archive for the ‘ Uncategorized ’ Category

Amazon Redshift

I started hearing about Amazon Redshift and I am quite interested in what they have done. As described here they have taken the table structure and flattened all rows to one column. As stated “Using columnar storage, each data block stores values of a single column for multiple rows. As records enter the system, Amazon Redshift transparently converts the data to columnar storage for each of the columns.” . This is a great solution  but what the business needs is a OLTP and OLAP solution where we can replace our OLTP with a hadoop, SAP HANA  or Redshift solution.

The issue being is that all of these solutions give us almost real time but none of them are really a integrated solution where OLTP change = DATAMART change and allow for business driven self service.

Business idea’s have to be tested against current solutions/rules and be promoted to agile teams where the cost of the implementation is estimated. Business may see that the cost will streamline solutions or that current solutions do not warrant a redesign or quick solution. If so all team members have to be brought into a sprint to make the change happen. Automated test must be created if there are more changes to come.

To meet all these needs a system must be, version structured with many versions be it business, developer or production release all in one. Stage able so that any version/part can be rolled back or forward to its parts. and testing is done automatically as defined by the business.

The issue I find with redshift is it seems to be a one way process data into a DataMart that is a OLAP solution but not a total solution.

I have much to learn

 

SSAS The Operation Was Canceled Because of Locking Conflicts

I have so many cubes and I never saw this error. Every one of my cubes was processing on the same schedule they always have. Users are now getting error: “The operation Was canceled because of locking conflicts” My processes full are having this and so are the users. Whats going on? My users are mostly SSAS Tabular so all users are “read only” as far as I was concerned they can never make an exclusive lock” but they are getting a “read” lock and some users started getting “A lock could not be obtained” Whats going on?

It pretty simple. First all my jobs were every hour from 6 am to 7 pm. So every hour on the hour all jobs will run. Starting off it was one ProcessFull per database but as time went on PowerPivot daily refresh in SharePoint got moved to SSAS Tabular Server hourly ProcessFull which then became partitioned and then “we need realtime” so every 10 min.

Whats happening?

Processing (process partition reading in data and calculating) It needs to swap out old with new partition but will wait till long running queries are finished. Its a write lock

Read Data (Users making a read lock that will block write till finished)

New Read Locks (Will wail on write locks till released) Users get a hourglass

This is where locking occurs. SSAS Looks at the read locks and will wait till timeout threshold has been reached. It will then timeout the users. The users were making very small MDX queries in the beginning but now they are getting very complex by drilling down to many details that are causing a bigger query plan and are blocked by the update. After a ProcessFull the cache plan has to update so only the first user running the query is slow. The rest of the users get a fast result leaving me trying to analyze random users

First: stagger all processing 7:05, 7:10 for start times

Next: Use <Parallel> tag

This will process all objects in same thread and do one commit

Finally: the  Managers must make a tradeoff on ProcessFull reduced or Use ProcessAdd . ProcessAdd adds a small partition that adds a small footprint that does not lock your main partition.  ProcessAdd is more complicated as any updates must use a delta script such as a fact of 8 with an new change to 3 must delta process -5 and then ProcessFull to true up.

This is a simple analysis but if the problem continues you must look at using process add. If your database is in a star snowflake schema process add is very simple. you have a fact of $3000 and an update went through -$200 you add a new fact of -$200 and the sum is $2800 and then do a process merge at end of day. Dimensions still have to processed full.

There is a much more detailed description here

 

Some of my favorite code comments

Shamelessly stolen from http://stackoverflow.com/questions/184618/
what-is-the-best-comment-in-source-code-you-have-ever-encountered

// I dedicate all this code, all my work, to my wife, Darlene, who will 
// have to support me and our three children and the dog once it gets 
// released into the public.
/// <summary>
/// This returns true if this control is supposed to be doing anything
/// at all for this request. Richard thought it was a good idea to load
/// the entire website during every request and have things turn themselves
/// off. He also thought bandanas and aviator sunglasses were "gnarly, dude."
/// </summary>

//When I wrote this, only God and I understood what I was doing
//Now, God only knows
// 
// Dear maintainer:
// 
// Once you are done trying to 'optimize' this routine,
// and have realized what a terrible mistake that was,
// please increment the following counter as a warning
// to the next guy:
// 
// total_hours_wasted_here = 42
// 
/**
* For the brave souls who get this far: You are the chosen ones,
* the valiant knights of programming who toil away, without rest,
* fixing our most awful code. To you, true saviors, kings of men,
* I say this: never gonna give you up, never gonna let you down,
* never gonna run around and desert you. Never gonna make you cry,
* never gonna say goodbye. Never gonna tell a lie and hurt you.
*/
stop(); // Hammertime!
// Replaces with spaces the braces in cases where braces in places cause stasis 
   $str = str_replace(array("\{","\}")," ",$str);
/*
 * You may think you know what the following code does.
 * But you dont. Trust me.
 * Fiddle with it, and youll spend many a sleepless
 * night cursing the moment you thought youd be clever
 * enough to "optimize" the code below.
 * Now close this file and go play with something else.
 */ 
 
// I don't know why I need this, but it stops the people being upside-down

x = -x;

Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos, 
et generaliter nullam in infinitum ultra quadratum potestatem 
in duos eiusdem nominis fas est dividere cuius rei demonstrationem 
mirabilem sane detexi. Hanc marginis exiguitas non caperet.

..and translated into English:

(It is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second into two like powers. I have discovered a truly marvellous proof of this, which this margin is too narrow to contain.)

doRun.run();  // ... "a doo run run".

 /**
   * This method leverages collective synergy to drive "outside of the box"
   * thinking and formulate key objectives into a win-win game plan with a
   * quality-driven approach that focuses on empowering key players to drive-up
   * their core competencies and increase expectations with an all-around
   * initiative to drive down the bottom-line. I really wanted to work the word
   * "mandrolic" in there, but that word always makes me want to punch myself in
   * the face.
   */
private void updateFileCountLabel() {
 
options.BatchSize = 300; //Madness? THIS IS SPARTA!
 

Book review

I just got my copy of Instant MDX Queries for SQL Server 2012 by Nicholas Emond. I did a book review and its a great way to jump into SQL 2008 on SQL 2012 Mutidimensional

SSAS 2012 Multidimensional vs Tabular

I’m still getting used to it the days of many dimensions with cubes is going to migrate into tabular. I will be the first to admit its hard to let my SQL, Oracle.. data warehouses go. But using tabular you can process data and handle bad data using DAX. It lets your users use data. Its all free with Excel 2010. Download PowerPivot and see what you can do with you old data marts, data warehouses, SQL Databases Oracle Databases, Access, etc. You can use them all together and let your power users create dynamic PowerPivot sheets on their own. Using SharePoint 2010 you can look at usage and import into tabular if it the data needs to be updated on schedule. Multidimensional still has a purpose but with Tabular you can put it into memory and have the fastest results. Any small data marts I do as Tabular and huge one use Multidimensional and then tabular.

http://www.slideshare.net/d501159/02-sql-server2012sp1blazingfastperformancel200