A data steward is
In a way, it falls to the data steward to bring control to the state of data chaos in the organization. For any organization viewing data governance seriously, this is a crucial role and the appointment of the right person for the role is important.
When do you need to hire a data steward?
There are usually three circumstances under which an organization has to look for a data steward:
A new data governance committee is being formed and the core team to drive this is being put together;
The employee currently playing this role has transitioned to a new role, team, or function.
New data elements that need governance and stewardship keep getting identified, usually as part of new projects undertaken.
In each of these situations, a new data steward can be either hired or identified from within the organization.
Organizations usually prefer to hire from within when possible because a current employee meeting the necessary prerequisites will have more context and knowledge of the organization. Recruiting someone for this role is considered only if there is no one internally who has the ability to take up this role or is willing to do it.
Whether they choose to recruit for the role or identify someone internally, different organizations take different approaches to the selection of a data steward.
- The BDE approach: the steward is someone who has expertise in the subject area of the business data elements (BDE) (s)he is supposed to handle.
- The Functional approach: the steward is someone who is an expert in the functional area.
Trying to choose one over the other is not a prudent practice as someone who is an expert in the BDE may not have the necessary understanding of the business and vice versa. So they may not be able to fulfill their data stewardship responsibilities effectively. In some organizations, the boundary between the two approaches tends to be unclear, leading to confusion and redundancies.
‘Discovering’ data stewards
If you are undertaking this exercise for the first time in your organization, we recommend a ‘watch and spot’ approach to identifying the right data stewards. Kick off your data exercises: understand your data landscape, put together your business glossary of data, and develop flow diagrams and charts to understand how data is generated and transferred through the system. During the course of this very collaborative process, the individuals with the necessary expertise in each area and a genuine interest in collaborating in the project will naturally emerge. And if they don’t, then you know that you will have to hire someone externally.
What should you look for in a data steward?
An ideal data steward is also an ontologist, someone who is deeply interested in the relationship between concepts, data and entities in a domain, and who wants to help manage its complexity by organizing the data. Apart from this aptitude and a show of interest, here are some specific eligibilities to look for when hiring for this role:
- A thorough understanding of programming (Python, Perl, PHP, C/C++/Java, etc.)
- Expertise in databases—relational databases if your organization is still using these—as well as experience working in SQL-based systems.
- A strong background in non-relational (NoSQL) databases as these are fast becoming the norm. Knowledge of MapReduce, BigTable implementations, sharding, Memcache, etc. and familiarity with the software of today such as Hadoop/HBase.
- A sufficient understanding of data modeling.
- Real world experience in data warehousing.
- Business acumen so that (s)he is able to get the business context and see the larger picture.
- Good technical writing skills as communication and documentation are key parts of this role.
Managing data stewards effectively
Getting the right people for the role is just the first part; how well you utilize their time and expertise is the bigger, more important part. Here are some common data stewardship challenges and how to tackle them:
Lack of clarity about stewardship goals
The best way to provide clarity to the data stewards and to the DG program team as a whole is to set clear and measurable goals for each data steward. Here are some metrics to track:
- Data availability: do all the stakeholders feel that they have ready and easy access to the data they require?
- Timeliness: does the data reach by the time it needs to?
- Completeness: are all the data points meant to be captured, actually captured?
- Accuracy: is the data correct, up-to-date, and reliable?
- Integrity: does the data maintain its sanctity as it moves through the flow pipeline?
- Conformance: does the data adhere to set standards?
An overall Data Quality Score based on these parameters can also be assigned to each BDE and tracked on a regular basis.
Inefficiencies in managing the stewards’ bandwidth
It is likely that there will always be more situations where stewards’ inputs are required than they have the bandwidth for. For a data steward to pore over the documentation of every project or sit in on every meeting to identify areas for governance and stewardship is practically impossible. This is where having specific Project Data Stewards can help. These project data stewards’ primary responsibility is to engage closely with projects to identify new data elements that need governance and integrate them into the larger flow.
Disengaged data stewards
When data stewardship is not the sole responsibility of the person in the role and they have other pressing work commitments, the stewardship responsibilities can take a backseat. This may manifest itself in the form of a data steward not responding in a timely manner, missing deadlines, or simply being unavailable for data governance meetings. This tricky situation can be prevented by ensuring right at the beginning that the person knows the importance of the role they play in the overall data health of the organization. Periodic conversations with a mentoring member of the data governance council can help keep them on the straight and narrow. It is also a good idea to share reports and updates on a regular basis i.e, highlighting the work and outcomes achieved by those stewards. The others will not only be motivated but also realize the benefits of better data quality.
We will close off this post with an interesting quote on TDAN from a data steward with many years of experience, “I learned that Data stewardship is about trust and transparency. Your actions speak louder than your words and it is very true in the case of a data steward. It didn’t matter how many times I presented about the importance of data quality and adhering to data standards; but when I actually sat down with data owners and corrected the issues first-hand, it always drove home the point.”