Opinion: Why is model-driven engineering unpopular in industry, and what can we do about it?

Model-driven engineering (MDE)¹ is a branch of software engineering that aims to improve the effectiveness and efficiency of software development by shifting the paradigm from code-centric to model-centric.

I have been in the MDE community on and off for about 15 years. My supervisor at the University of L’Aquila, Alfonso Pierantonio, introduced me to MDE in 2003. Back then, the approach was still in its infancy and was not even called model-driven engineering. I wrote my Bachelor thesis in 2003 on code generation based on Unified Modeling Language (UML), and my Master thesis in 2006 on model versioning.

During my four years as a PhD candidate at the University of Bergen, I researched formal aspects of model versioning and multi-level modeling, and successfully defended my PhD thesis in 2011. During my following four+ years as a researcher at SINTEF, I conducted applied research on domain-specific languages and models@run-time for automating cloud management platforms. I also compared two-level and multi-level techniques for modeling cloud application topologies. My work has led to several publications in journals and conference proceedings.

Eventually, I decided to come back to the business world, with the aim of transferring these research results to industry. As an advisor and manager at Norway’s largest IT organizations, I have worked with architectures and solutions as well as trained colleagues and clients. While I did not expect MDE to be widespread, I did expect UML and domain-specific languages (DSLs) to be an integral part of these activities. Unfortunately, I have been disappointed.

What’s the problem?

Take my statements with a grain of salt—this is an opinion piece. My personal experience is not representative of the whole industry, and IT professionals working in other domains may have a widely different experience. Nevertheless, here is what I learned:

Industry does not use UML or uses it poorly

For the majority of IT professionals I have met (architects, developers, operators, etc.), “modeling” means drawing boxes and arrows on PowerPoint or similar applications. Only about 20% of IT professionals I have met are familiar with UML, and only a fraction of them uses it, in a few cases with questionable results. “I’ve seen things you people wouldn’t believe…” including diagrams mixing the syntaxes of use case, component, and sequence diagrams.

The lack of adoption of UML or any other general-purpose modeling language is problematic. It introduces ambiguities and increases the chances for misunderstandings.

Industry uses DSLs but does not know about it

The majority of IT professionals I have met use DSLs regularly. However, none of them knew what a DSL was before I told them.

In the field of cloud services, AWS Cloud Formation, Azure Resource Manager, or Google Cloud Deployment Manager, and the OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA) are all examples of DSLs. However, developers and operators often regard the specifications written using these DSLs as configuration files rather than models.

The lack of understanding of DSLs and modeling in general leaves a lot of untapped potential. The abstract syntax and semantics of models enable automated reasoning and facilitate automation, which brings me to the following point.

Industry does not exploit models@run-time and automation

The majority of IT professionals I have met use different data structures at design-time and run-time in their systems, which is time-consuming, error-prone, and hinders automation.

Models@run-time provides an elegant solution to this problem. It enables a consistent representation of both design-time and run-time information.

The deceit of the 2000’s, which envisioned a utopian world where software engineers would model 80% of their time and generate 80% of the source code with model-to-text transformation, has never become a reality. Nevertheless, models@run-time has gradually become part of self-adaptive systems, not only in prototypes but also in production.

Several cloud management platforms, such as the TOSCA-based Cloudify, have successfully applied this technique. Developers and operators specify cloud application topologies as models at design-time, while monitoring and adaptation engines programmatically manipulate these models at run-time. This is also the foundation of several AIOps platforms.

Similar to the point above, the lack of understanding of models@run-time and automation is a missed opportunity.

What do we do about it?

Taking my experience into account, here is what I would recommend to academia, standardization organizations, and tool vendors:

Teach modeling and MDE in Bachelor’s degrees

I believe that fundamental concepts such as abstraction, modeling, metamodeling, descriptive and prescriptive models, concrete and abstract syntax, structural and attached constraints, informal and formal semantics, linguistic and ontological typing, conformance, to name a few, should be well understood by all future IT professionals. Therefore, I believe that they should be part of the curriculum of any computer science and software engineering degree, even at the Bachelor’s level.

At the universities where I have studied and worked, modeling is partly introduced in software engineering courses and further elaborated in MDE courses. The problem is that software engineering courses tend to focus on using UML for documentation purposes, while MDE courses typically belong to Master’s programmes only.

Fix UML

When I studied UML during my Bachelor’s degree, one of the elements that baffled me was aggregation, graphically represented by a hollow diamond shape. Its concrete syntax was akin to the one of composition, graphically represented by a filled diamond shape. However, its semantics was indistinguishable from the one of association.

Martin Fowler discussed this issue in his book UML Distilled already 15 years ago: “Aggregation is the part-of relationship. It’s like saying that a car has an engine and wheels as its parts. This sounds good, but the difficult thing is considering what the difference is between aggregation and association. In the pre-UML days, people were usually rather vague on what was aggregation and what was association. Whether vague or not, they were always inconsistent with everyone else. As a result, many modelers think that aggregation is important, although for different reasons. So the UML included aggregation […] but with hardly any semantics. As Jim Rumbaugh says, ‘Think of it as a modeling placebo’.”

In other words, aggregation is meaningless. If you disagree, I challenge you to provide me with definitions of the semantics of aggregation and association, where the distinction between the two is unambiguous.

Unfortunately, OMG keeps the concept in UML and even researchers in the community keep using it. In fact, I have witnessed UML class diagrams containing aggregation at many of the conferences on MDE I attended between 2008 to 2016. Worst of all, I have seen UML class diagrams containing both aggregation and association. Either the authors believed there is a difference between aggregation and association, or they confused aggregation with composition.

Aggregation is just one example. The syntax and semantics of several other concepts in UML are questionable as well. If UML is the best general-purpose modeling language the MDE community can offer, and if not even researchers in the community can use it properly, it is fair to expect that industry will either avoid UML or struggle using it.

Improve frameworks for DSLs and models@run-time

During my years as a researcher, I defined the syntax of two DSLs and contributed to the corresponding models@run-time environments. Eclipse Modeling Framework (EMF) and Connected Data Objects (CDO) have come a long way since their inception. However, even for IT professionals with a PhD on the subject, implementing DSLs and models@run-time environments with today’s frameworks requires a considerable amount of effort.

My co-authors and I have discussed this issue in one of our papers: “Our assessment is that EMF and CDO are well-suited for DSL designers, but less recommendable for developers and even less suited for operators. For these roles, we experienced a steep learning curve and several lacking features that hinder the implementation of models@run-time […]. Moreover, we experienced performance limitations in write-heavy scenarios with an increasing amount of stored elements.”

Perhaps some of these issues have already been addressed by now. Nevertheless, I believe the frameworks for DSLs and models@run-time have to reach a much higher maturity level for industry to use them in production.

Reduce research on model transformation

During my years in academia, I have seen all sorts of research papers on model transformation. My gut feeling is that, for any model transformation need, there is a technique for that in the literature. The question is: to what extent are these techniques adopted? I suspect that the most optimistic researchers in the community expected model transformations to become as widespread as compilers. Well, they have not.

I believe the MDE community should rather concentrate on leveraging existing techniques to address concrete limitations in their frameworks. In 2018, even something as simple as renaming an element of an Ecore model does not lead to the automatic update of the attached OCL constraints.

Do not get me wrong—I am guilty myself, as I have published at least four papers on model transformation that did not achieve the impact I wished. Perhaps researchers in the community should move to other research topics, which brings me to the following point.

Increase research on multi-level modeling

As a researcher, I enjoyed working with multi-level modeling. My intuition is that this technique should be the golden standard for self-adaptive systems.

My co-authors and I have researched in the field of cloud management platforms, and the results are promising: “a smaller language definition in the multi-level case, with some other benefits regarding extensibility, flexibility, and precision.”

Despite these results, I believe the road ahead is long. The frameworks for multi-level modeling need to reach a sufficient technology readiness level for this technique to come out of the academic niche. I wish the multi-level modeling community would be larger.

Conclusion

While I still believe MDE to be the best solution in several problem domains, I am afraid that the approach is stuck in the technological “valley of death.” To bridge this gap, academia and industry should collaborate more closely in the future. What do you think?

Footnotes

Some researchers in the field would argue that this approach is not an engineering discipline and that it should be called model-driven development (MDD) instead. The Oxford English Dictionary defines engineering as “the branch of science and technology concerned with the design, building, and use of engines, machines, and structures.” Considering that software and data are in fact structures, I am perfectly comfortable with the term model-driven engineering, and I will not distinguish between MDE and MDD. ↩

Conner Ward
20 December 2022 at 22:55

There is a reason UML and modeling pressure comes from the academia side. Academia has a large emphasis on analysis, and thus taxonomy. It follows that in such an environment the clear solution to problems would be a strong taxonomic system, IE MDE.

I want to believe. I want to believe that DSL’s and code generation from modeling will work. However, it has not, and I think that is to do with most software engineering in practice favoring construction towards goals rather than analysis, and real life requiring the flexibility to use duct tape when things do not neatly fit the taxon (similar to criticisms of the fragility of tightly coupled large scale OOP). You additionally see this same kind dichotomy of emphasis between theoretical correctness and ease of construction/flexibility in the space of programming languages between academia/aerospace (Lisp, Haskell, Ada) and the myriad of popular languages in the rest of industry.

I concede that DSL’s emerge when a use case for them has been sufficient commodified, IE, the instantiation of virtual machines / containers (AWS, Docker, etc). However, the point of arguing the point of differentiation of a ‘config’ vs a ‘model’ is unclear. Think about engineers usage of tools from a UX perspective. It’s not that it’s not that engineers “use UML poorly” or that it is a matter of education, in the same way it would be ludicrous to suggest that an app’s misused interface is the fault of the user. It is my observation that many times the time and effort cost in maintaining the model rarely justifies it’s existence, especially at Agile CICD integration pace, as it is almost always trailing the codebase.

Lastly, which I think is a larger problem, is that from above management / orgs / academia want to capture human capital, in the form of knowledge bases, wikis, models, reports, system diagrams, documentation, but provide tools and systems which rarely incentivize use from engineers at the bottom (though on the scale of course government / aerospace stakeholder structures seem closer to overlap with those of academia).

Conner
20 December 2022 at 23:11

I could go on, because I do think there is still significant amount of development to be had in general purpose languages and taking the best ideas from academic languages and finding syntax that incentivizes their use, or achieves abstraction at the language level of goals like concurrency (IE Golang).

Furthermore, I think the future of ‘programming languages’ is moreso a synthesis of AI/NLP natural language techniques and more formal syntaxes like programming/markup languages (you can already see such semi-rigid learned syntaxes emerging with how people query Google, constrained with how google works on a technical level, and the syntaxes people have have learned to get best results from the machine (a bit like prompt engineering).) However, I think such languages / syntaxes are unlikely to emerge top down, but rather syncretically / reflexively over time through the interplay of mass user input and machine response, such as aforementioned on Google, or the next generation of NLP-derived AI systems.

Maybe I am just biased, as a programmer, the degree to which semi-formal written language syntaxes similar to programming languages will be integrated into the mainstream.

Maybe I living an echo of your past experience with UML.

We shall see!