Software engineering blog of Clément Bouillier: January 2011

Monday, January 3, 2011

Data backup solution based on RSync with a NAS

I have once experienced in the past some limited data loss due to a hard disk crash, and lastly, my first external hard drive starts to have some issues…I can just reiterate popular recommendations to think seriously to backup previous data as soon as you got more and more alerts, like repeated hard drive scans at startup (or when you plug it for external drives), suspicious behavior when reading data on drive…that’s what I have done lastly and I avoid lost of plenty of personal photos and videos…
From that moment forward, I decided to set a permanent backup solution. After having a look at web hosted solutions (not convinced completely convinced), I finally went for my own NAS, a DLink DNS-323, which is really easy to configure and extend (Linux embedded). It was also a chance to get hands dirty with Linux toys Sourire (long time…), but don’t be afraid to try! (except if you are just able to write documents and mails with a computer…else it could take you several long nights to get it running)

Rsync over SSH as the main toys

Rsync is an incremental files synchronization software for Unix systems. It is command-line based, but could be really powerful along with scripts. I let you search over the web for details on this tools, I will only show how I use it for backup. Note that there is several shared solutions around RSync. I was particularly inspired by wiki.dns323.info and BackupNetClone. I created my own scripts since the first one is too minimalist (based on BAT scripts…outch) and I found the second one too intrusive on clients computer (need SSH deamon and RSync server on each).

SSH will be used to secure RSync file synchronization.

To use it with Windows clients, the first thing is to install Cygwin (or other Linux emulator), really simple, you just have to click Next until package selection, then you select RSync and OpenSSH packages (just the main, dependencies will be grabbed automatically), and then you click Next until the end.

I will come back to client set up (don’t be afraid, it is just script that will have to be scheduled…) after a quick view on the server side, i.e. the NAS.

Set up NAS

My NAS, a DLink DNS-323, is Linux based. You have to use a fun_plug script that will be loaded at NAS startup. You can use ffp that includes some applications, particularly SSH and RSync daemons. Follow instructions in the following link to install it: wiki.dns323.info/howto:ffp.

Typically, you will set up a backup account on DNS-323 through admin interface (http://[NAS IP]), add a "backup" account in the Advanced tab. Next, you can change home and shell in /etc/passwd.

Set up clients

First, you have to configure the client once, then you would probably change configuration of which folders to backup.

First time set up

I explain here what you have to do once for each client computer (i.e. one to backup):

1. generate SSH keys that will be used next:

ssh-kengen -t dsa –b1024






You can let the default key path. Do not provide any passphrase if you want to automate your backups (it would ask it each time you want to backup).




2. copy SSH public key of client to the NAS with:




ssh-copy-id -i ~/.ssh/id_dsa.pub backup@[dns-323 IP]


I have packaged this in a script along with some simple configuration (IP, backup user name…).





What to backup?


My scripts (explained below) will search for configuration files, each giving one path to backup with its destination path on the NAS:



# Local path to backup, use /cygdrive/[drive letter]/... syntax
LOCAL_PATH_TO_BACKUP="/cygdrive/c/testbackup"

# [Optional] Target Rsync module -> override global settings
#TARGET_MODULE="backup"

# Target path in module
TARGET_PATH="test"



Launch a backup


I have a launchBackup.bat script that launches the backup.sh script through Cygwin. In this script, I load configuration from setup, I set up a SSH tunnel, then start rsync and finally close SSH tunnel.



RSync command is:



rsync -aivx --port ${SSH_TUNNEL_LOCAL_PORT} --chmod +rwx ${LOCAL_PATH_TO_BACKUP} 127.0.0.1::${TARGET_MODULE}/${TARGET_PATH}



Parameters name talk by themselves, –aivx are some common options of rsync. I don’t have yet set up incremental backup with --link-dest (hard linking option) and I am wondering about using –-delete that removes on server also what have been removed from your client folders (then you have to make sure that one server path is only used by one client to avoid massive deletions…).



Don’t forget to check your Firewall settings if you get some “Connection refused”-like errors.



Scheduling


You can simply rely on Windows Tasks scheduler. And you are done!



Assessment…



Not so pricy, I got the NAS for 100€ + 70€ for 1,5To hard disk drive. It is quite easy to set up, open as you are the only master of your backup, and then easily configurable/extendable and with unlimited possibilities.



About security, it removes hardware failure but do not protect from other more serious domestic risks like burglary, fire…but for that I got an idea, it is to build a small network of NAS like that (two to start…) with some parents for example, providing us a backup solution by the way Sourire



And a final word about environment impacts, I have bough an energy meter and it consumes only 10 Watts when idle (most of the time), quite good finally.

Sunday, January 2, 2011

Combine Hierarchy/Work Item/Date dimensions in TFS 2010 cube

I start using TFS 2010 cube to make a Product Burndown chart based on Story Points (Y axis) over time (X axis) and State (series) of User Story and Defect work item types of our custom process template (customized from User Story and Bug ones found in MSF Agile process template).
 
Basics...
With Excel, it was really simple to do it: connect to the cube, use Date and Work Item Dimensions with Story Points Measure and tada, it is done. Ok, well, in fact, it was not exactly what we would like, because we work on a legacy application...so we have User Stories and Defects on different subjects, what we have modelled using a Project work item type that contains related User Stories and Defects.
For example, we have a "Maintenance" Project (not really ending btw...) which contains all production bugs we are fixing, and "Project A" and "Project B" Projects, each ones classically having a given budget and start/end dates.
 
...deeper...outch ! Problem !
My team has several Projects on one application at the same time, each from 20 man days up to several hundreds man days, but we keep only one iteration/product backlogs for the whole team on this application. Then, each Product Backlog item is related to one Project.
Then, I would like to have a Product Burndown chart restricted to items related to a particular Project. It would help to see how its Product Backlog items are evolving over time and to manage effort needed to keep this Project on track.
I would think that Work Item Tree Dimension would help me...but trying to add it as filter to my Excel report, it does nothing !
In fact, it is an expected behaviour. I understand it digging into SQL Server Analysis Services features (I never had a look at before...) and TFS 2010 cube configuration. There are several explanations:
  • Dimensions are associated to one/several Measure Group, and the 3 dimensions I 'd like to use are not all together in a same Measure Group, Measure Group examples:
    • Work Item includes Work Item and Date Dimensions, but not Work Item Tree one
    • Work Item To Tree includes Work Item and Work Item Tree Dimensions, but not Date one
  • Story Points Measure is a calculated member associated with Work Item History Measure Group, and it is calculated based on hidden Story Points Measure of this Measure Group

TFSCube-MeasureGroups

Trust me that other Measure Groups do not include at least 2 of these Dimensions...and none include the 3.
 
Solution
So the solution was:
  • to add a view to TFS datawharehouse combining Fact tables containing Work Item facts and Hierarchy facts (no change to datawharehouse loading process!)
ALTER VIEW vFactWorkItemHistoryToTree
AS
SELECT wih.*, witt.WorkItemTreeSk
FROM dbo.FactWorkItemHistory wih
INNER JOIN dbo.DimWorkItem wi1 on wih.WorkItemSK = wi1.WorkItemSK
INNER JOIN dbo.DimWorkItem wi2 on wi2.System_Id = wi1.System_Id
INNER JOIN dbo.vFactWorkItemToTree witt on wi2.WorkItemSK = witt.WorkItemSK
GO
  • to change TFS cube DataSource to add the new view and link it to related Dim tables (derived from FactWorkItemHistory for example), just use the designer included in Business Intelligence Studio (you open Analysis Services database directly on server with it)

DataSourceViewDesigner

  • to add a new Measure Group with the 3 Dimensions I need (derived from Work Item History Measure Group for example, with Work Item Tree dimension added)

TFSCubeModified-MeasureGroups

  • to add a calculated member based on hidden Story Points Measure of the new Measure Group (within Business Intelligence Studio, open Team System cube –> Calculations tab to add the new calculated member, and associate it with Measure Group with Calculation Properties icon)
-- Story Points with Hierarchy Tree dimension
-- Just a part of MDX request to show we use vFactWorkItemHistoryToTree member of our new measure group...
CREATE MEMBER CURRENTCUBE.[Measures].
[Microsoft_VSTS_Scheduling_StoryPoints_Tree] AS
...
Sum
(
[Date].[Date].[Date].MEMBERS.Item(0) : [Date].[Date].CurrentMember
,[Measures].[vFactWorkItemHistoryToTree Microsoft_VSTS_Scheduling_StoryPoints]
)
...
 
CalculatedMember
 
...even deeper with the same solution
Then, I can do what I wanted with Excel, i.e filter on each Project to have a Burndown chart on each.
Note it can be declined with any work item hierarchy. For example, we also have a Release work item type in our process template, allowing to manage releases contents. Then we can follow how Release backlog evolves through a Burndown chart.
 
project burndown chart
 
Don't be afraid to look at TFS cube, it took me 2 days to find out what I need (starting with no skills in SSAS...). It can be very powerful.