Mark Lewis on Intelligent Content Design and Metadata
Content Matters
Content Matters

Season 3, Episode 7 · 1 year ago

Mark Lewis on Intelligent Content Design and Metadata


Episode Summary:

Mark Lewis is a content engineer and the co-founder of the intelligent content consultancy, Caliper Content Services. Mark joined us on the Content Matters Podcast to talk about a topic near to his heart, and a source of frequent frustration – metadata. 

You think it’s simple to create metadata on your content, but there’s a lot to think about to build your metadata model and we cover it all in this episode. 

Key topics included: 

  • What is content engineering 
  • What is metadata and why it’s so important 
  • Why you can’t design metadata as an afterthought 
  • His advice for content strategists 

This is the first time we’ve gone deep into a discussion on intelligent content and Mark takes what could be a complex topic and explains it in easy-to-understand terms.  

Additional Resources:


Book: Object Technology: A Manager’s Guide 

About Mark Lewis:

Mark is a content strategist, content engineer, speaker, author and STC Fellow. He is recognized throughout the industry for his ability to prove the business case for moving static content to XML and intelligent content. Through design metrics, his approach can prove alignment across content strategy and corporate strategy. 

His experience spans a variety of industries including life sciences, finance, aviation, education, and oil & gas. Mark started the DITA Metrics community to promote the sharing of metrics, ROI and case studies. In his book DITA Metrics 101, Mark’s cost models offer a framework to determine the savings possible with enterprise-wide intelligent content implementation. 

He is a contributing author of DITA 101 and a technical reviewer of Managing Enterprise Content: A Unified Content Strategy, both by the Rockley Group. Mark is also a contributing author of The Language of Technical Communication and The Language of Content Strategy. 

Where You Can Find Mark:

Mark Lewis is the cofounder of Califer content services and intelligent content consultancy. He's the author of two books on intelligent content, one focused on data and the other one that talks about the business case for intelligent content. That one's actually coming out soon. As a principal content engineer, marks recognized for his ability to prove the business case for moving from static content to XML and intelligent content and, along with teaching and doing workshops on this topic, he's worked with a lot of companies designing intelligent content solutions and helping fix existing content solutions. Mark joined us on the podcast to talk about a key element of intelligent content and that's Metadata, and he takes what could be a somewhat complex topic around metadata and he explains it and easy to understand terms. I think when you listen to this you'll walk away with a new found appreciation for intelligent content strategy and one of the things that goes into developing the best content strategy and creating the great intelligent content solution for your organization. I hope you enser the podcast. Let's get started. Let's get started by telling us a little bit about yourself. In the work you do at Califer content services. Yeah, sure, that's comes good. My first career I was a software engineer. I went to Georgia Tech in Atlanta and I get a degree in information and computer science because I love the math and science. In learned about object oriented design and I wrote software for many years and I worked in a lot of small companies where they would say, Hey, mark, software is great, but we need a user manual to go with it. Can you can you write that, because we need to use that cell with the product. Okay, had done that kind of writing before, but kind of enjoyed it a lot and I fell in love with that and decided that I wanted to pursue a career in technical writing. Then in two thousand and three or two thousand and four, I read and Rockley's book managing enterprise content. I learned about XML and content management systems and I thought cool, this is object oriented content and I loved object oriented design. So I'm like, I get it and I've just kind of been on that path ever since to the point where in in recent years I'm, I should now designing software to be used by writers. So because of that, I'm a cove on the cofounder of Caliper content services, where we take organizations through a content engineering process in which we'll design and intelligent content solution for them that solves the content problems that they're having that are preventing them from achieving their business goals. It sounds like a lot of really interesting work. But here's here's a good question to get started with. What is content engineering exactly? It's it's a relatively new term, I'll say new meaning, like you came out five or six years ago, but it's content engineering. It's a type of engineering for designing content solution software. So an intelligent content solution is primarily software and we've developed a process at Caliper that follows the best practices of software engineering established over the past thirty years. Again, we're ultimately we're really designing software here, and it's got just just like the classic Software Development Life Cycle. It's got the phases discovery, requirements, analysis, design, build, test, train and in the discovery phase we're going... take a look at your content, your current state of content and your content processes that you're using now and the publications you're trying to produce from from that content. So this is a look at how complex they are and are these documents actually even candidates for an intelligent content solution? Once we kind of have an idea of the scope of everything that we should be looking at, you know this how complex this, this project could be. Then we go into an analysis phase where you you want to gain an understanding of all the different requirements for the different functional areas, because an intelligent content solution it's about offering collaboration, content management, published and delivery, and you want have a good understanding of of your requirements in each of those functional areas. Like I said, it's it's you're going to go through a classic software design phase. So we're going to be creating designs for offering templates where the authors can actually capture the content that's needed to produce those publications that they're trying to get out the door. Get to have publishing templates to get those publications out the door just how you want them to look, designs for workflows, content processes and designs for delivery channels. That and we need all these designs to satisfy all those requirements. So then you got to build it. So this is a build phase where either writing software from scratch or your integrating some software tools together, or you might be configuring the software of some some say some vendor that's got a platform that has all the different tools in it and you're doing configuration of that. So that's that's the build phase. So testing, training, those are those are kind of self explanatory, but those are all part of this this content engineering process. And again, it's it's kind of mirrors software engineering very closely because ultimately you're designing software solution. And Yeah, that's that's a that's a process that you you go through so you don't Miss Anything. It minimizes risks and make sure that the solution solves all of your content problems. So that's it's a very interesting exercise. But my favorite part of it is actually where you going in and you're looking at somebody's documents and you're doing a content analysis and design to come up with those content models and metadata that that ultimately become the offering templates, you know, the the authoring experience that the writers are typing into and capturing that information. So that's the I'm an engineer at heart, so that's that's the puzzle that I like to solve, is the content models and metadata design. I remember my days of content software development and everything you said. They're just took me back to what it was like. But but I the content design part, I think is really fascinating and I think that one of the key components of creating intelligent content today and designing your content models correctly is working in creating the metadata. So can we kind of go a little deeper in like what exactly is metadata? METADATA is a I mean it's a it's a great, great point, because this is actually a key thing I want to talk about today. It's it's a critical...

...part of this content engineering process that I described. So you always get the classic definition of of Metadata. It's data about data, like, okay, well, that's that doesn't really help me a whole. So it's data about a thing or the the object in question. Okay, well, what do you mean by that? Mark? So teach by example, because that's what I find is a very effective so let's take like a Microsoft Word Document and if you click file info and you you're going to get a panel or a dialog with some properties that in it and you're going to have who created the document, the number of words and the document total editing time. Well, that's metadata about the document as a whole, and so that's one little little example. It's data, it's also metadata about that document. Another great example I like to give is you have an image file. So you right click on an image file and you view the properties for it. Whatever operating your system you're using, I don't care. But you bring up similarly, you bring up this properties panel and you're going to see different pieces of metadata like the dimensions of the of the image, Pixel depth, color depth, if you're in DPI, might be the term that you're more familiar, familiar with. So if you're working with a photograph, you know specific kind of image file you've got, probably stored in a jpeg file, and you've got metadata. That's this particular set that's called IPTC and this is some acronym that was developed in the gives late s that that's used for recording information about the camera and what camera was used to take the picture, the Lens that was on the camera, you know exposure times, F stops, ISO settings, who's the photographer, who owns the copyright, and so it's all very specific metadata to that content type and if you think about you know who might want that will think about, you know, some of the latest digital asset management tools. Damn tools. That's exactly the kind of information that would be needed in a digital asset management system. So you you want to track information about this huge library of photographs that you've taken. That that's how you make that happens with metadata. Now, let's say your images for your world, your situation, an aren't photographs, but you might have images that you want to track some additional information on. So in order to make that happen, you got to kind of build that into your design, build it into the content model for your images. Another example would be like web pages. If you, if you've ever done any web page development, you probably seen there's like a metadata header that's embedded in the top of the file. That's metadata about that page and there's a variety of metadata that's that's in that header. Simple example would be a person, and metadata about a person could be so that I take a resume, so all the different kinds of information you'd have on a...

...resume. That's data about a person, metadata about a person, or a health system where you're you're tracking, you know, the different health information about a person. So so those are some those are some simple examples, but they they work pretty well. Yeah, there's a really good examples and it kind of clarifies it really well. But I guess one of my questions would be, I understand what it is, but why is? Why is Metadata is so important? Well, it's going to it's going to be different for each piece of for each content type that you're dealing with, but in general it it answers a question about the thing that you're applying metadata to. So we'll start with this. There's there's different kinds of metadata. So there's so thinking a back about again our intelligent content solution, we talked about capturing collaboration and workflow requirements content processes. There's there's a type of metadata that's about the you know, the status in workflow for a piece of content that's in your content management system. That tells you where a piece of content is in in the life cycle. So you you've started authoring a piece of content, so it's in the authoring state, or it's in the reviewing state or it's in the editing state. So that's what a state is. This piece of content in that. So that's METADATA. That's workflow metadata. Who's who approved? This piece of content, who is it assigned to? So these are all Meditata feels about workflow that are going to be common to any document, any component file that's in the CMS or in the in the workflow software. So it doesn't matter what the content type is, if it's being workflowed, it's going to have, you know, this common set of workflow and status metadata those. But then we get into a category where you're going to have metadata feels that are unique to a given content type. Like maybe we talked about the photograph and that IPTC Metadata, you know. So that's unique. And these are what I would say is a classification and categories, AIAN metadata. So this is metadata that's going to help authors search for and find a piece of content that they're looking for. And in the content management system, flip side, you know, this search metadata could also be used to is filter metadata by the publishing engine to filter inner out a piece of content from the final publication. So it's it again. It CLASSIFICA, classifies and categorizes a piece of content and so it's probably going to be unique to to that content type. So another category or another type of METADATA. So you can see how there's a variety of metadata that that come. That's a vital part of an intelligent content solution. Yeah, there sounds like there's a huge amount. And I have another question and I'm sure I probably already know the answer, but I'm curious to hear your perspective. Are Writers doing a good job of...

...applying metadata? Well, it be it be interesting to to know what your opinion is. It's I've seen a variety of us, I'll say performance, a cross the industries, because I've worked with so many different industries and airline, oil and gas and sports and curriculum, and you just see a variety of performance in terms of how authors properly fill out METADATA fields. And you know some are more discipline than others. But that's not what bothers me. What bothers me is that whether the METADATA fields are available to be filled out in the first place. You like, are they into the are they in the design? So most of the discussions that I think I hear about, and this is like at conferences or in workshops talking with customers, they talk about metadata in terms of applying metadata long after the contents been written, like like it's an afterthought, and that really bothers me so, or you've got an intelligent artificial intelligence that that's crawling the content and it's trying to automatically apply metadata. How you know how successful it is? That? I'm not me a hundred percent sure. But another, another piece of this, this afterthought is like if you have software like antidot or zoom in that are performing metadata enrichment, where they're they're importing source content into their platform and they're enriching it so that the Metadata is available in the publication and in the reader experience, and and that's great. I mean those are those are two great products. What I really want is is for Metadata to be available in both the source content and the published content, so that way it's available for authors that are looking for that piece of content in the in the source cms, and it's available to end users, consumers, when they're doing searching in the in the published content, whether that's in Peda, for portal or whatever the delivery experiences is for them. So it's the afterthought. Metadata as an afterthought is is what I want to get people to understand is is bad and start thinking of Metadata as a from the beginning as part of the design. Another example, example. So you can think of index terms as Metadata. You know, it's it is data about a content component or a section or a topic. It doesn't matter if they're index terms are embedded in the text like inline or they're applied as Metadata to to the the topic or section level. It's hopefully the authors have added the index terms before they finished writing or while they're or while they're writing. So if you think of you think of a book and you know hard copy book and you think of the index that's in the back of it. It's a navigation feature that helps the reader find the content that they're they're looking for. And I'm out quite often will go into bookstore and... long has it been since I've been book store pick I'mant pick up a book and decide if, you know, if I'm if I'm interested in it, I'll flip to the back and I'll just look at the index terms to see what the book is about. You know, kind of kind of old school, but you know, I'm know there's other people out there they would agree with me. So it's a valuable navigation feature. So my point is, hopefully the authoring tool that you're using allows you to add index terms so that they're in the source, whether it's embedded inline or at the topic or section level, and hopefully the tool helps you have some sort of a control vocabulary, you know, so that all the authors are using the same terms rather than similar terms. So, because you want it, you want to harmonized index. You don't want you don't want a situation where some riders are putting in singular versions of a term like frog versus frogs. You want to control the tense of the the the verbs you want. You want everybody to say choose reduce as an index term rather than reduced. So you want, you know, harmonized index and you need a controlled vocabulary to help you accomplish that. So want a tool that lets you put in index terms as metadata during the writing process, so it's so that they're embedded in the source, so not as an afterthought, index terms up front and the building of the content. Thank you to the sponsor of the content matters podcast. In genus is a leading provider of agile content management solutions. You can use in GENEUC CMS to manage and deliver modern websites. Customers, support portals, knowledge basis. M Work in genucs software enables content reuse to mobile and multichannel content delivery and insightful content discovery. To learn more about how ingenus can support your content experiences, visit in genuscom. I have been, and we're worked in so many content management systems where that metadate is not there and it's horrible. So I definitely understand what you're saying. So if treating METADATA is an afterthought is bad, what do we want to do well this, this is where it's about content engineering. We touched on this the very beginning. You need to need to determine the metadata that you need in the model as part of the analysis and design phase so that if it's if the METADATA is designed into the content model, it will become part of the authoring template. For example, when when I teach how to design content models, I often use the classic recipe model because everybody knows what a recipe is. It's got a title and description, ingredients and steps, maybe it has an end result, what what the dish should look like when you're all done, or maybe it's how you serve it, but everybody knows what's in a recipe. Those are the those the elements of rest content, elements of a recipe to determine the metate metadata needed for the recipe so that we can design it into the model. I ask I ask the students in the workshops how people might search for a recipe on a website. We look this search and then we look at those those searches or queries that they come up with and we reverse engineer them, again, engineering to make sure that we have the the METADATA fields and place to support those searches. And this is the process that I take all my customers through to help them understand how to how to design metadata. Start with,...

...start with searches and then reverse engineer it. So here's an example. Find all the recipes that are desserts that have the ingredient slobobian chocolate. So the system is going to search for documents where content type is recipe, and so content type is our first piece of Metadata that we need. So you need to have what is what is the content type of each piece of content. So other other content types might be restaurant reviews or product reviews, but we only want recipes. So content type equals recipe. Say Systems going to search for recipes where one of the ingredients and that's a content element within the recipe. One of the ingredients is slobobian chocolate. Now, as I said, that's a content element that's already in in the the content in the recipe itself. So we don't need to add ingredient as metadata. It's it's in the recipe. That's a content element in the recipe. So that brings us to searching for the last part of the query was desserts. How do we find desserts? Well, we need a piece of Metadata that tells us whether a piece of content is a dessert or not. So we could support that by adding a metadata field called category that has values like dessert, entree, soup, Salad, and so when the author is writing a recipe, they'll fill out the category metadata field and and select dessert from the controlled vocabulary for that metadata field. And so those three things content type equals recipe ingredients, slobovian chocolate in in the category metadata being set to dessert. That's going to help us find that piece of content. So we just reverse engineered that. That search. Called it a query if you want to, but it's a search query. So that's a fairly simple example. If you want to add another condition to the search. You know, give it, make it, make it a more powerful search. Like I want to search for recipes that are sugar free deserves because I have a friend that can't have a lot of sugar for whatever health reasons. We can add a Meditata feel called sugar free that has the values true or false, and so the recipe is going to have a metadat this this metadata field that's a instead of being a well, the control of vocabulary is technically true and true and faults. It's a Boolean. That metadata field is is going to allow us to execute that kind of query where I want to search for rest a content type is recipe, ingredients is slobobian chocolate, category is dessert, and sugar free equals true, and that's really going to narrow down what you what you find. So instead of finding of a whole long list of recipes, you're going to find exactly what you're looking for. So this is this is how we're able to build metadata into the the design of the content model and that's going to enable intelligence searching on the intelligent content. So you have to design it ahead of time and that's that's my that's my key point. So not only can the authors do intelligent...

...searching. At the CMS end, users, are consumers, can perform intelligence searches on published content. So it's a win win. Both the consumers, readers of the content, and the author's benefit from building this into the design. So we're putting intelligence into the content and then, in pushing that intelligent content out to the Internet, it's it's basically object oriented content. Is a way to think about it. You've you don't hear the term that often now, but you but we've been talking about the semantic web for a long time. I think web three dotto maybe is the new version of that term. But we've been talking about this for a long time and this, this is how we make that happen, by designing intelligent content and then ultimately pushing it, in combination with its Metadata, out to the web. Here, students, in clients do they find this to be our difficult exercise to do it first they they usually do. It's get to take it. Take them through a couple exercise and they start to get it. So the thing they always say at the beginning is when I say Oh, what are the how would people search for a piece of content? They say, Oh, there's there's there's there's too many possible searches. That that's you. You know this is this is going to be too hard, it's going to take too long. But I keep asking them how would someone search for this type of content? And we then we write down the searches and then when they can't think of anymore than they say, Oh, you know, I think that's all of them. And it ends up not being that many variations because you come up with what you think is a variation and it actually harmonizes to kind of a base query, but it ends up not being that that many. And that's just classic. Every time I take people to that example, it's it's alien to them at first, but then once we get into it they're like, okay, this isn't this isn't so bad. So we take a look at these searches, this list, and we figure out what's the metadata needed to support those searches. Like it would go back to photographs. Someone went through this exercise when they determined all those IPTC metadata fields. Back in the the, the S or s, there was someone that said I want to be able to run a search and I want to be able to, you know, find a particular photograph and in so to support those complex searches, I'll say they had to design that Metadata, the those IPTC fields, into the metadata for the image files. And so this this isn't anything new. People have been doing this exercise for a long time. Yeah, it's interesting to kind of see how it all works for improving the ability to search. But does it does it also have benefits for for other uses, like can you use metadata for other things other than searching? Sure, we we talked about some of the categories of the workflow, workflow and status and reporting. Then there's the categorization and classification. Let's let's flip this around. So personalized content. That's a hot topic. Yes, so instead of someone searching at the recipe website and pulling the content to themselves, you could push it to them. So let's say that you're a you're a member of a recipe website and... fill out your profile and you you're creating a login, create your profile and you you check the items that you're interested in. There's going to be some if they're if it's a smart website, they'll they'll give you a chance to say these are the topics that I'm interested in and you and you check those and in this fictitious example, they check desserts and they check soups and they maybe there's a way that they can choose ingredients that they're interested in, like blood orange chocolate and slobovan chocolate. You're making me hungry to say in all these chocolate things. It's done. Then, let's say once a month the recipe company wants to do an email out to their members of their website and they want to they want to automatically build this email based on content or recipes that they know that they're members are going to be interested in. So once a month they run a search to see if there's any new recipes that match the items that an individual checked in their profile. So they're going to say, for this user, for this profile, look at their items, are there any matches with the new content that we created and if they and if they're, if there are, then they'll create an email with links to those recipes. So it's kind of just the opposite. It's kind of like the user running the query of the website saying Hey, I'm interested in these, but in this case the company's doing it and based on your profile and they generate that that list of recipes that match your profile, build an email, send it out. So you're going to receive an email that has, that only has the recipes and and topics in it that you're interested in. It's not a blanket email out to everybody where you get a generic email and your inbox and you know it's generic. Most of the time you're get deleted. You're just like, I got that's that's just, it's it. It's unlikely that something that I'm interested in. But if you, if you knew that, you remember of you know, some content service or website that the emails that they set were about just the topics that you were interested in, you're probably going to read that one. And so that's building up that that that trust with that between the organization, the company and the and that member. So this is this is one way to deliver personalized contents. Again, personalized content is it's a hot topic. This is one way to do it. We talked about this several years ago at content marketing world and you kind of start to see the lightbulbs go off with some of the marketing people in the audience are like, okay, this is this is how we create quality emails, quality meaning they've got content my customers care about. Yeah, we always talk about personalization from the perspective of knowing that customer, but we never seem to talk about it from how do you get the right content in place and how do you know what the right content is to send? So that's kind of interesting to hear it that way. So your key takeaway is that it's a lot better to plan ahead and build Meta data into the design as early as possible, that instead of instead of after the fact. So this is also like storing the metadata in both the source content and the published content. Then, Yep, big benefit there. Yet we've talked about talked about classification and and categorization, metaday that would be used for searching and filter filtering...

...and reuse and and this, this is this is all part of how we make content intelligent and more valuable and useful, I think. Yeah, so if you were going to give one piece of advice to a content strategist, what would it be? It kind of giving away skill that I've had that I think gives me an advantage. I'm going to say go learn object oriented design. It's methodology that they learned back when I was in college and Software Engineering and it's really a big part of my mindset when I'm when I'm designing content solutions and content strategies for my clients. It's a I think I think there was a speaker to conference years ago that said figure out what you're learning Schema is, and I found that when I'm learning new technologies I tend to think of the software, think of the what's being managed by the software in terms of objects, and that kind of helps me learn new technologies and it also really helps me understand someone, someone's content and what they're trying to accomplish with it. So at conferences there's a book that I give away. It's a great book. It's an old book. It's like a two thousand and twenty five year old book called Object Oriented Technology, a manager's guide by David Taylor, and I anybody that's interested in learning object oriented design, if they raise your hand and you know at a presentation, I'll give them a free copy of the book. So it's IT teaches you object oriented design by using the human body as an example. So the think of the body like as a it's this big system comprised of smaller systems which are made up of organs and that are made up of cells. So you have this big thing breaking down into two smaller things, and the book walks you through all of this and shows you how to think of the body and all of these systems as objects. So it teaches you about about inheritance, which is which is very critical to this way of thinking. Quick, quick, example, think of all the different types of cells that are in your body. You have like nerve and muscle and blood cells. The book will teach you to think about me, like what is every cell have in common? Well, what's unique about a given cell type? And this is where this is the that stealing too much from the book, but this is how it teaches you about object design and inheritance. So highly recommend it. It's great way to will learn object oriented design, which is will directly help you see how XML and content management systems are object oriented content. I was going to say real quick there's a I actually gave a presentation on this to to the STC, Society for Technical Communication, my local chapter here in clearwater Florida. Shout it. Shout out to the Sun Coast chapter. There's there's a link on my website on the caliber content services website. That's to a recording that I gave to the STC about learning object oriented design, and so will will make that link available somewhere here in the when we post this podcast. Yeah, so it's a it's a great book and check out that will get your link to that recording... yeah, I'll definitely make sure we put the links in the show notes for the podcast, in the descriptions. Cool people can access that. It definitely sounds like a great book. But you're also working on a book. So last question. What's your book about? Yes, so our first book was did a metrics one and one which is about how does to design content metrics. Our new book is called making the business case for intelligent content and it's it's very related to this content engineering process because in order to in order to get into buy your new content solution, you have to spend some you have to convince some executive to allocate funds and resources for it. Yeah, and this is something that I struggled with myself many years ago and it and I've seen other people struggle with it. I mean we're in our roots, we're technical riders and but we don't write business cases. So what I wanted to do was, from from my perspective being a part of the tribe of technical riders, I want to show you how to look at your organizations business goals and content goals and content problems and use that to create a business case that would resonate with the executive. So that's that's the that's the elevator description of what the books about, but it's just it is something that I thought the industry needed from from the perspective of a fellow Tech Rider, and so will we should be finishing that that here in two thousand and twenty one. It sounds like it's a really good book. I would definitely that would definitely be one I would want to purchase if I was working and trying to convince someone to buy software solution, the content solution. Yeah, Yep, it's a it's a challenge. I imagine. This was a great conversation, mark. I think I learned a few new things to go along with what I my kind of old school thinking kind of brought me up a little bit. I really appreciate having the talk today. Yeah, it was a lot of fun and I hope that. I hope that the audience listening will go off and take some of the ideas that we talked about here and go and do further research and they find benefit from it. So thanks, BAR, appreciate it. Thank you.

In-Stream Audio Search


Search across all episodes within this podcast

Episodes (39)