This month we are talking to Chris Ward, Head of Software and Chief Architect, as he gives his thoughts on the top 6 challenges associated with a modern approach to data transformation in the current diverse and big data world.
- Designing for Scale
When we talk about designing for scale, it is important to note that data itself has many different dimensions. The volume, variety and velocity of the data that we deal with do make it somewhat unusual… but this is nothing new, and basic relational systems are nowadays well equipped to deal with millions, billions, even trillions of records. When I talk about designing for scale, I am referring directly to the scaling of requirements or scaling for different technologies. The tools that are available to us are constantly changing as are the data sources themselves, and so any solution that we build must reflect this. We must adopt a modular approach to the design – allowing us to adapt to technical changes, react more quickly to business changes and not put all our eggs into one vendor or technology basket!
- Planning for change and the unknown
I have worked on many different data solutions over the years, and a key challenge that has remained common throughout is the ability to plan for and react to change – how do we keep the business agile and focussed? There are many tools out there that provide the user with the ability to build their own silos of data and query them, allowing them to look for and predict trends – but this type of work can at times, prove to be counter-productive. Take the ‘Simpsons Paradox’ for example, this concept explores the idea that when asking the same question of different sets of data in isolation common patterns and themes may emerge, however when combining these data sets the effects may disappear, even be reversed in some cases and as a result can make the ability to plan, or prepare for the unknown even harder.
- Investing in data quality
Data quality and veracity has always been a challenge when dealing with, or representing data. Throughout my career I have seen people who are responsible for collating data be looked at as the people who need to fix it, and this really shouldn’t be the case. I’ve always approached data quality with a different view, reporting on it, exposing it and quarantining if appropriate so that it can be fixed within the source system – rather than in a data solution. Attempting to fix data quality issues from within the solution can end up being a real pain, in no time at all you end up with a massive solution, containing so much logic and complex business rules trying to cater for every different scenario that it is impossible to maintain. In my experience, engaging with the different business stakeholders and having a strong, influential business sponsor is a far more effective route to getting data issues fixed at the source.
- Winning business confidence
Start small, win small. I have always believed that the trick to winning business confidence is simple – start small and build from here. Always keep in mind that the people who are going to use this data need to have absolute confidence in it, they need to know that they can provide the necessary evidence with this data, tracking it to its source and knowing exactly what’s happened to it along the way. Talking the user’s language is also hugely important, data is nothing without context so what we present must relate to the conversations that are happening. Another benefit of the ‘small wins’ idea is the speed at which we are able to work, long running transformation programmes are a thing of the past, the world moves quickly and so must we – slow work means missed opportunities. This fast paced, bite-sized approach to working allows for a business-driven organisation, one that can adapt and respond easily as the environment changes and business priorities move.It we look at the parable of the beer and nappies in the context of data mining we can see the benefit of looking at these small wins and acting on them quickly. The legend goes that some time ago the retail giant Wal-Mart decided to put all its checkout data into a huge warehouse and allow data mining. One of the trends that appeared in the data surprised everyone, seemingly, on a Friday night there was a correlation between the sale of beer and nappies. Following a little further research, it transpired that on their way home from work fathers were stopping to get nappies and at the same time were picking up a few beers. Deciding to capitalise on this, Wal-Mart strategically placed beer and nappies close to each other during the highlighted times and watched sales rocket. Whilst there is some doubt surrounding how true this story is, the underpinning lesson remains. The ability to bring data together and use data mining to unearth what would most likely remain undiscovered trends otherwise allows us to act quickly on, and benefit from small wins.
The benefits of automation feed directly on from the idea of small, quick wins. We no longer want to be hard-coding data transformation or business rules and the use of data automation and metadata management allows this and embraces the world of Information Catalogues. Recent advances in technology through advanced data profiling and Machine Learning have meant that the ability to query and extract meaningful content from vast amounts of data have become infinitely more efficient and reliable. In fact, the more data the better – tricky, time consuming, manual extraction should be a thing of the past, we are now in a time of benefiting from as much data as possible.
- Business outcome focus
As with anything, the work we do with data needs to have a purpose. No matter how efficient or automated, the work we do requires time and resource and if we can’t give a reason as to why we’re doing something, or be able to accredit value to it – then we really need to question why we’re doing it. The key to syncing it all lies in business outcome focus. If we start with an outcome, and work back from this – we can be sure that each activity that stems from this has a purpose and directly impacts on the business activity in some way showing a direct ROI from the effort involved.