Amazon Redshift

I started hearing about Amazon Redshift and I am quite interested in what they have done. As described here they have taken the table structure and flattened all rows to one column. As stated “Using columnar storage, each data block stores values of a single column for multiple rows. As records enter the system, Amazon Redshift transparently converts the data to columnar storage for each of the columns.” . This is a great solution  but what the business needs is a OLTP and OLAP solution where we can replace our OLTP with a hadoop, SAP HANA  or Redshift solution.

The issue being is that all of these solutions give us almost real time but none of them are really a integrated solution where OLTP change = DATAMART change and allow for business driven self service.

Business idea’s have to be tested against current solutions/rules and be promoted to agile teams where the cost of the implementation is estimated. Business may see that the cost will streamline solutions or that current solutions do not warrant a redesign or quick solution. If so all team members have to be brought into a sprint to make the change happen. Automated test must be created if there are more changes to come.

To meet all these needs a system must be, version structured with many versions be it business, developer or production release all in one. Stage able so that any version/part can be rolled back or forward to its parts. and testing is done automatically as defined by the business.

The issue I find with redshift is it seems to be a one way process data into a DataMart that is a OLAP solution but not a total solution.

I have much to learn

 

SSAS The Operation Was Canceled Because of Locking Conflicts

I have so many cubes and I never saw this error. Every one of my cubes was processing on the same schedule they always have. Users are now getting error: “The operation Was canceled because of locking conflicts” My processes full are having this and so are the users. Whats going on? My users are mostly SSAS Tabular so all users are “read only” as far as I was concerned they can never make an exclusive lock” but they are getting a “read” lock and some users started getting “A lock could not be obtained” Whats going on?

It pretty simple. First all my jobs were every hour from 6 am to 7 pm. So every hour on the hour all jobs will run. Starting off it was one ProcessFull per database but as time went on PowerPivot daily refresh in SharePoint got moved to SSAS Tabular Server hourly ProcessFull which then became partitioned and then “we need realtime” so every 10 min.

Whats happening?

Processing (process partition reading in data and calculating) It needs to swap out old with new partition but will wait till long running queries are finished. Its a write lock

Read Data (Users making a read lock that will block write till finished)

New Read Locks (Will wail on write locks till released) Users get a hourglass

This is where locking occurs. SSAS Looks at the read locks and will wait till timeout threshold has been reached. It will then timeout the users. The users were making very small MDX queries in the beginning but now they are getting very complex by drilling down to many details that are causing a bigger query plan and are blocked by the update. After a ProcessFull the cache plan has to update so only the first user running the query is slow. The rest of the users get a fast result leaving me trying to analyze random users

First: stagger all processing 7:05, 7:10 for start times

Next: Use <Parallel> tag

This will process all objects in same thread and do one commit

Finally: the  Managers must make a tradeoff on ProcessFull reduced or Use ProcessAdd . ProcessAdd adds a small partition that adds a small footprint that does not lock your main partition.  ProcessAdd is more complicated as any updates must use a delta script such as a fact of 8 with an new change to 3 must delta process -5 and then ProcessFull to true up.

This is a simple analysis but if the problem continues you must look at using process add. If your database is in a star snowflake schema process add is very simple. you have a fact of $3000 and an update went through -$200 you add a new fact of -$200 and the sum is $2800 and then do a process merge at end of day. Dimensions still have to processed full.

There is a much more detailed description here

 

GIT Flow

GIT Work flow GIT Flow is a powerful source control solution that allows multiple features to be developed off of the main project. The diffidence between GIT and GIT Flow is swim lanes where master swim lane is release and development is features that have been completed. New features are put into their own swim lanes similar to branches. This makes it easy to see breakdown of product development and manage releases.

git-flow-timeline

SQL Server and PowerShell 4.0

I’ve found myself using using powershell much more frequently to do stuff I used to do with SSIS just because its fast to develop and less complicated to deploy.

The one issue that really had me puzzled was when I started noticing that scripts that I had tested successfully from Powershell ISE were failing on deploy to QA or UAT.

The answer I found out after a lot of confusion was that Windows was at PS 4.0 but SQL was still using PS 2.0.

 

This article was very helpful in a workaround by launching a new instance of powershell v 4.0 from PS 2.0 but the drawback is now you have to go on the server and manipulate files which in a locked down production environment is time consuming and defeats using powershell with SQL Server Agent in the first place.

Microsoft SQL Server 2014 Community Technology Preview 2

Microsoft has released Microsoft SQL Server 2014 Community Technology Preview 2. Its has a lot of changes in the database side but not much in OLAP Multidimensional and Tabular, SSIS or SSRS.