Book Review: Principles of Data Management
Principles of Data Management: Facilitating Information Sharing. By Keith Gordon, British Computer Society, 2007, ISBN 978-1902505848, 274 pages.
Principles of Data Management might not sound like a thrilling title and, given its business focus, you might think not all that relevant to many readers of Ariadne. However, before dismissing it out of hand, consider this: may not the research outputs of an institution be regarded as business assets that require management (in, for example, an institutional repository)? On what other data does a university rely? Staffing, recruitment, enrolment, courses, library stock, costs? Without careful management of these disparate data, are not Higher and Further Education institutions ultimately losing money as they reinvent wheels or create complex workarounds to integrate data systems? Moreover workarounds that fail when a new system is introduced? Why does it take so long for the IT people to do something as simple as provide the Virtual Learning Environment course information? If any of these questions interest you, then perhaps the book's subtitle, 'Facilitating Information Management' will persuade you to read on.
Describing itself as a 'professional reference guide', Principles in Data Management attempts to 'explain ... the importance of data management to modern business' as well as discussing the 'issues of those involved'. Essentially it deals with the coal face of data management, what can go wrong without it and what are the challenges faced by data professionals. It is important to note that the book's main focus is on data sharing within an organisation, for example, providing customer-facing businesses with the competitive edge through timely and accurate information on a customer's purchase history patterns.
In an increasingly competitive education market, these lessons are as relevant to tertiary educational institutions as they are to businesses and there is a lot in this book for data professionals across our sector. There are also lessons to be learned here for those who devote their efforts to information sharing more widely, across institutions on the Web for example, though the book is fairly dismissive of XML standards development efforts:
"There have been a number of initiatives within particular industries to develop standard XML formats for the exchange of data between companies within that industry, but these initiatives are not coordinated. You can, therefore, end up with different formats for the same concept in different industries."
Initially this rattled me, but then I remembered the OpenXML and Open Document Format debate and the troubled history of RSS and could see the point. Perhaps because of this, for the large part, the book avoids interoperability standards, favouring the database and data management systems that underpin them. This seems sensible to me, because these systems are the foundations of all information sharing; an exchange format is nothing without content.
Chapter One opens the book by introducing the idea that information, alongside money, people, buildings and equipment, is a key business resource. While this seems rather obvious, it is astounding just how many organisations consider information to be something left up to the Information Technology Department and not an issue for the business as a whole. The arguments are compelling, and this chapter could have usefully dwelt on them longer; however they are picked up again in Chapter Three, "What is Data Management?", in the section that explores life without it.
Chapters Two and Four change tack, plunging the reader deep into the technical detail of 'Database Development' and 'Corporate Data Modelling'. The former introduces relational databases and entity/relationship models, while the latter discusses the issues involved in building a data model for an entire organisation, rather than just one project area. One reason given for this is to facilitate the sharing of data between applications across the organisation, but the book also states 'There is no definitive role for a corporate data model', suggesting, I think, there are many reasons why you might need one; however it also sounded a little as if one might not always need one, which is perhaps unfortunate.
The ideas of a generic data model introduced in Chapter Four are picked up again in one of the appendices, all of which are used to expand on the themes from the main chapters: Data Modelling Notation, Hierarchical and Network Databases, the aforementioned Generic Data Models, an example Data Naming Convention, a rather confusing Metadata Modelling appendix, Data Mining, HTML & XML and, finally, XML in Relational Databases.
The next four chapters (5 to 8) take the reader through some of the problems faced by data managers, including data definition and naming, ensuring the quality of the data, and ensuring access to the data, explaining database transaction management and discussing backup strategies. There is also a short chapter on metadata that gives three broad and different explanations for what it is (and within each definition is a series of sentences that end "...is also metadata". This chapter serves as a timely reminder to this supporter of Institutional Repository Managers of what is confusing and frightening about metadata. Metadata for data managers is distinguished, in this chapter, from metadata for information professionals and libraries.
The penultimate grouping of book chapters moves to discussing the roles involved in data management, examining the responsibilities of database administrators and how these differ from those of repository managers. It is important to highlight here that repositories discussed in this book are not the repositories most familiar to Ariadne readers (since the last issue). Here we are talking about specialist database systems designed to safeguard data definitions and make them available across the enterprise. It is not about storing, managing and showcasing research output by any stretch.
The final chapter takes us on a whirlwind tour of recent industry trends including data warehousing, data mining, distributed database systems, object-oriented database systems, storing multimedia objects and, of course, the Web. There is also discussion of a growing trend to buy in data management 'solutions' to perform certain tasks and how this can lead to problems as these packages do not necessarily interoperate. The content of this chapter is variable in quality – the exploration of multi-dimensional models of data highlights a very powerful mechanism for querying data and the packing section raises some questions; but this chapter is largely undeveloped in examining the Web and this is slightly disappointing.
In the introduction the author claims three target readerships for this book: data management practitioners, for whom it is a reference guide; IT and IS managers, who are looking to fill in the gaps in their knowledge in the data management area; and finally business managers, for whom an understanding of data management issues may help explain why IS keep asking for more data management professionals. The book probably favours the first two of the target readerships, admitting itself, that business managers should probably avoid the technical chapters (which dominate the book). It would have been good to see more for business managers, or at least more compelling arguments for why a business should want to invest in data management. I have entirely selfish reasons for this: increasingly it seems institutional repository managers are being asked to make the business case for the creation and continued funding of an institutional repository. If you consider the institutional repository to be a useful data management tool (and I do), then the reasons why an organisation needs data management, would apply to institutional repositories too.
There is a fourth, implied, readership for this book: students of the BCS Certificate in Data Management Essentials and often the book reads a lot like a text book, using bullet points that could clearly form the basis of a set of exam questions along the lines of 'In what ways might a database system fail?'. While useful for students and in keeping with the reference format, this can sometimes feel contrived. The book also often presents the ideal (theoretical) situation, such as a generic corporate data model, but does not discuss what you might do when this is impossible due to constraints within the organisation. That, however, would take a much larger book!
In summary, Principles of Data Management: Facilitating Information Sharing, fulfils its remit as a reference guide to data management. People from different backgrounds can dip into it and find the starting points for a variety of topics but you are left needing more in each topic and so exploring the further reading section is almost essential, though for a reference book this is undoubtedly a good thing. Occasionally I found myself asking why certain chapters or sections were included. For example, the HTML/XML appendix explains what each one is and how they differ but does not relate any of that to data management. The same goes for some of the complex technical explanation of data modelling, which remains firmly theoretical and leaves it up to the reader to work out the practice.
On the whole an interesting and thought-provoking book; but one that would benefit from some serious editing, since in its current form it will leave the reader interested but perhaps slightly unsatisfied. I look forward to the 2nd Edition!