Minimum Viable Product Brainstorming

I Have a "Killer Idea" And Now I Need Software...

Written By: Andrew Fruhling, Chief Operating Officer at Calavista

Every day, a million and one thoughts fly around in our heads. Sometimes, they're killer ideas — the ones we think will make us never have to work again for the rest of our lives. Other times, they are just products of a very overactive imagination.

However, we know that it is one thing to dream and another to transform that idea into an actual business. Often, a vital part of this transformation is creating a Minimum Viable Product (MVP). This is a term Eric Ries made popular with his book, "The Lean Startup." He defines an MVP as “that version of a new product a team uses to collect the maximum amount of validated learning about customers with the least effort.

In simple terms, an MVP is a product with just enough features to attract customers to validate the product idea as soon as possible. MVPs primarily help companies determine how well their product resonates with the target market before committing a larger budget to it. If you have a potentially killer idea, you should start by creating an MVP to validate the idea.

Approaches to Building an MVP

There are several approaches to building a minimum viable product and this is where things often get challenging – especially for people who do not have much software development experience. I have been in the software development industry for many years, and I have built teams using many different approaches. Some work better than others. People often ask, “How do I get my initial software built?” I thought it would be good to put my thoughts down in writing.

Let’s start with a disclaimer. I have been a customer of Calavista’s twice before joining the company in 2020. As you would expect, I am a fan of the model used by Calavista and I have seen it work well for many customer projects. With that said, I hope this following analysis provides a good overview on why I think this model works well.

For this analysis, I want to make some assumptions. The scope of an MVP is usually constant – hence the term, minimum viable product. That leaves you with three primary levers to consider: Time, Quality, and Cost, plus of course, the overall Risk of the approach. Below, we explore the various approaches to building an MVP and what they would mean for your start-up. Each approach will be scored on a scale of 1 to 5 across Cost, Time, Quality, and Risk (where 5 is the best possible) and an Overall score will be assigned based on an average of the Cost, Time, Quality, and Risk score.

1. Build it Yourself (Overall score: 2.5/5.0)

There are a lot of low-code and no-code approaches to building simple MVP applications. These can work in some cases until your business complexities require a more sophisticated system but expect to have a significant rewrite at some point. I have personally recommended this type of approach to companies when cost is truly the most important criterion. This allows you to minimize your investment while you validate the idea.

a. Cost (5): This has the lowest possible cost as you are building it yourself. Many people take this route if they cannot afford to hire other people to do the work.

b. Time (1): There is often a significant time span and learning curve required to experiment with the capabilities of tools selected. You are often trading costs for time.

c. Quality (2): This element depends on you. If you're an expert developer, you would probably have an MVP with decent quality. However, for most people, this is like watching a DIY home improvement program where the homeowner does not even realize the impact on the quality until a professional shows them what should have been considered.

d. Risk (2): There is quite a lot of risk to this approach, as you often sacrifice quality and time because you want to cut costs.


2. Build your own Software Development Team (Overall score 2.0/5.0)

You could also decide to hire developers and build a team to work on your MVP. It is common for a startup to want to build their own team after all if you think you have a killer idea involving software, you think you want to have your own team to build it. Just like building a house, there are many skillsets required to build a software product – even an MVP. Building an initial MVP requires more than just a couple of developers. You will need expertise around architecture, requirements, testing, development tools & processes, and much more.

a. Cost (1): This has very high upfront costs. You typically need to hire multiple people with different skills to build a quality MVP. In today’s job market, these resources can be expensive and hard to find.

b. Time (1): It takes a lot longer than most people think to find good resources, identify the development tools and processes, and build the actual MVP.

c. Quality (4): You will likely get a quality MVP if you build the right team and processes first.

d. Risk (2): Your MVP’s success rests on your ability to hire and retain a team with the appropriate skill set, knowledge, and attitudes.


3. Outsource with Contract Coders (Overall score 1.8/5.0)

I see this often and have noticed that while it seems like an attractive approach, it usually does not end well. You can find coders online and pay them for the scope of your project. This may work well for small, straightforward projects that have a clear start and finish. Often, it grows to be multiple independent contractors who do not work together very well.

a. Cost (3): This is relatively low as you don't have to pay the overhead costs for hiring or management plus you can scale up/down as needed.

b. Time (2): It is usually swift to start, but you could soon begin to face issues with developers understanding the project’s scope, defining a holistic architecture, and delivering on time.

c. Quality (1): Despite good developers, the overall quality is often inferior due to interdependencies. In addition, a lack of focus on the overall architecture usually impacts quality.

d. Risk (1): This can be very risky as it might cost you time and quality in the long run as you end up with a disjointed product. Many times this leads to significant re-writes to improve performance, stability, maintainability, and user experience.


4. Outsource with Offshore Team (Overall score 2.8/5.0)

Many people talk about how you can outsource your development to companies in countries overseas such as India, China, Mexico, Belarus, etc. There are many cost-effective options but there are also many challenges.

a. Cost (5): Offshore development companies in India and China can provide development capacity at a meager price. On the other hand, locations like Eastern Europe and Central/South America are typically pricier but may offer better overall results.

b. Time (3): Depending on the processes of the company you outsourced your development to, the timing might be shorter. Often offshore companies can scale up a team much faster than you can hire, and the team will have recommendations for tools and processes. On the other hand, some offshore companies tend to have a “never-ending project” syndrome where the final deliverable seems to slip repeatedly.

c. Quality (2): This is where challenges often occur. Quality implies the product works as designed and as intended for the MVP.  It is difficult to ensure quality remotely, especially if you are not familiar with quality development practices.

d. Risk (1): Selecting the right offshore contracting company is difficult and could be costly if you make the wrong choice. There are hundreds (possibly thousands) of options. They all claim to have industry leading best practices and the experience to make you successful. There is often miscommunication around what the MVP needs and its delivery. Between language, culture, and time zone differences, miscommunication is common, especially when you do not have experience working with offshore teams.

5. Outsource with Managed Offshore Team (Overall score 4.3/5.0)

A Managed Offshore Team means you have senior leadership in the US who provide the technical expertise for the project. They typically also offer the practical knowledge to best leverage offshore resources. The onshore management team will include the following:

  • A senior development leader who would be like a VP of Development for most companies
  • A senior architect who provides the technical expertise across the project

Based on the size of the project, these may be fractional resources. This means you’re getting both senior US-based leadership with offshore development costs. If done well, it could deliver the best of both worlds.

NOTE: Many offshore teams will have a US-based resource assigned as part of your project. In my experience, these are typically not the same as the “Managed Offshore” resources described here. Typically, offshore teams assign a US-based account management role to the project rather than an industry veteran with more than 20 years of experience running projects.

a. Cost (3): While not the lowest cost option, you can save a considerable amount on staffing by having a blended team.

b. Time (5): With this approach, you essentially hire a VP of Development who brings a development team ready for your project.

c. Quality (4): The quality usually depends on the repeatable and automated processes you have established. With a great process and collaboration, you often get the best quality.

d. Risk (5): A strong seasoned leadership team with repeatable and often automated best practices that leverages strong offshore development teams with cost-effective rates can significantly reduce your project risks.


Final Thoughts

While there are several approaches to creating an MVP, you must carefully choose the one that best suits you. There is not a single best answer for all cases, and you will need to determine which is best for you. The table below summarizes the scores for each approach:
MVP Table

If cost is your primary driver, you may want to consider a ‘Build It Yourself’ option or a good ‘Outsource with Offshore Company’ option. However, if you want to mitigate your risk, I recommend the ‘Outsource with Managed Offshore’ model that provides the best of both worlds.

At Calavista, we have been providing complete, managed teams that are specifically configured to address a customer’s needs and follow the ‘Outsource with Managed Offshore” model described above. Every engagement is led by a Solutions Director (SD) – professionals who have 20+ years of development experience, multiple degrees from top-ranked schools, and specific, demonstrated expertise in managing distributed teams. We use a hybrid, Hyper-Agile® development methodology that we’ve refined over the last two decades. These factors enable us to deliver projects with greater than 94% success rate – 3x the industry average. If you would like to talk about how this could work for you, please let us know!


DevOps Methodology Explained: Why is DevOps Right For Your Organization?

Written By: Daniel Kulvicki, Solutions Director at Calavista

In the last decade, we have seen significant shifts in software development operations. One of these shifts is the evolution of DevOps, which came to play in 2008/9. Even as organizations continue to adopt the practice, DevOps is still considered an extra when it needs to be a fundamental. In this article, we are going to explore DevOps and why every organization should adopt it.

  • What is DevOps?
  • How does DevOps work?
  • Calavista Tenets of DevOps

What is DevOps?

For years, developers and most system operations teams worked separately. If you have ever worked in a software development company, you would know that these departments don't always agree. And this can always lead to fatal disagreements that delay the development and productivity of an organization.

DevOps is a term used to describe the collaborative effort between developers and the IT operations team to improve the application development and release process. DevOps is derived from ‘Developers’ and ‘Operations.’ It involves agile development methodology and Enterprise Service Management (ESM).

Agile development is a software development approach that focuses on collaboration between cross-functional teams to enhance rapid releases. This approach aims to help the development teams keep up with the fast-evolving market dynamics in software development. Like DevOps, changes in the agile development processes are incorporated continuously to produce a usable version of the application faster without compromising the quality of the output.

DevOps uses agile strategies, including collaboration and automation. However, it also includes the operations team who manage the application in production.

ESM applies IT system management such as system monitoring, automation, and configuration to improve performance, efficiency, and service delivery. This is the practice that brings the operations team to DevOps.

The developers would produce the code and hand it over to the operations team in the traditional software development process. The IT operators would generate or build the application from the code then proceed to test and production. In case of errors or when a client requests changes, the application would go back to the developers, and the cycle would go on.

DevOps changed all this through various practices, including Continuous Integration and Continuous Delivery (CI/CD). Continuous Integration allows developers to submit their code into a shared repository several times a day and throughout development. It is a cost-effective way to identify bugs via automation tools. This increases efficiency since the developers will fix the bugs at the earliest chance. However, it is essential to note that CI does not get rid of the bugs. Instead, it is a principle that aids developers in identifying bugs so they can fix them in a timely manner.

Continuous Delivery makes app or software deployment a lot more predictable. It ensures that the code remains in a deployable state even as developers work to introduce new features or make configuration changes. As with CI, Continuous Delivery enhances frequent app releases with limited instability and security issues. Thus, Continuous Delivery increases not only efficiency but also the quality of the application/software.

CI/CD go hand in hand. They allow developers to make code changes frequently and release them to the operations and quality assurance teams. Together, Continuous Integration and Continuous Delivery increase the rate of production and the quality of applications.

In DevOps, the team works together from development to deployment. This enables organizations to work on multiple projects simultaneously to produce high-quality applications and software.

How does DevOps work?

Developers and system operators differ on a lot of things. But, at the same time, the two teams must work together to successfully deliver software development changes to the end-user (developers write code, and operations gets the code to the end-user). Remember, customer satisfaction is the backbone of any software development organization and this is where DevOps comes in. While it's not easy, bringing both teams together will ease the development and rollout processes.

Let me explain how DevOps works with an example.

If we look at a typical software development team working on an application. The team includes software developers, software testers, DevOps engineer(s), and some other roles like scrum master and business analysts. The software developers write new code and make changes to existing code. The software testers ensure the code works as designed – often, this will include automated tests. The DevOps engineers make sure everything comes together and automates as much as possible. The DevOps engineers typically stand up the development tools and environments, implement a process to “build” the code that typically includes the automated testing, and provide metrics on the development process and the application. The primary goal is to produce an app that is stable and secure efficiently, and the DevOps engineers are a critical part of this team.

Software development can be challenging, especially because clients demand changes all the time. However, DevOps teams implement these changes faster through constant collaboration, communication, and automation.

At Calavista, we like to break DevOps into 6 different tenets. This helps identify various areas of focus for our clients.

Calavista Tenets of DevOps

If your company is about to adopt DevOps or simply looking for ways to improve your current development processes, you will need a solid strategy. It will involve bringing cross-functional teams together, which also means changing the work culture. So, how do you go about it? This section highlights the fundamental principles of DevOps.

When we talk about DevOps, we like to break it down into 6 key areas crucial to the success of your company’s development - deployment processes.

  1. Collaboration
  2. Automation
  3. Continuous Integration
  4. Continuous Testing / Test Automation
  5. Continuous Delivery / Continuous Deployment
  6. Continuous Monitoring

As with other many changes, adopting DevOps will require you to develop a repeatable set of steps for the team. What goes first, and what comes second? Your DevOps team needs chronological steps so they can work together.

For instance, in a typical situation, a simple DevOps pipeline would look like this:

Step 1: Developers write the code
Step 2: Both engineers and IT operators compile the code and check for errors
Step 3: The operation team enables testing and quality assurance to validate and verify the code
Step 4: Deployment - The code is moved to a staging and/or production environment

However, different organizations will have other DevOps pipelines. Therefore, it is essential to define these functions for DevOps methodology to succeed. This must also include a brief description of the automation tools you will use to develop and deploy an application.

Defining your DevOps pipeline allows smooth collaboration between the teams through the production cycle.


DevOps is built on the principles of collaboration between developers, testers, and system operators. The relationship between these teams will determine your production efficiency. Furthermore, the collaboration goes beyond this core team into your business stakeholders and others to ensure the team is properly building what is needed by the end users following an agile methodology. Helping everyone work together effectively will help you deliver great products.

This means that you might have to change your company culture. DevOps will not work for your organization if your developers and IT team don’t work collaboratively. Remember that it is a strategy that involves constant changes that must be validated in real-time.

DevOps is not only a developers-IT team affair. The stakeholders and management must also join the team so that everyone is on the same page. It is beneficial to both the organization and the team at large.

Good collaboration across the development and operations team and the broader organization is crucial to delivering outstanding software products.


Successful DevOps implementation relies heavily on automation. Therefore, it is essential to use the right test automation frameworks established in the right tool to automate extensive development and deployment pipeline sections. Automation is a combination of tools and scripting to implement a cohesive and efficient development environment and can include any part of the development, testing, and operations activities, including onboarding.

So, what is Automation in DevOps? It is the use of advanced tools to perform tasks that require minimal human intervention. However, automation does not get rid of the human functions in DevOps. Instead, it enhances the entire DevOps pipeline to allow quick releases. We will outline the benefits of automation in DevOps. But, first, let’s define the DevOps process that can be automated.

DevOps Processes to Automate

Ideally, you can automate all the processes of DevOps, but you usually do not have the time to automate everything. Your automation infrastructure will vary from that of another company based on your specific requirements. When thinking about automation, we recommend prioritizing the following processes:

  • CI/CD
  • Testing
  • Continuous Monitoring

DevOps automation begins with code generation. Next, multiple developers will submit their code into the source code repository, requiring Continuous Integration (CI). At this stage, your automation tool should detect bugs and any inconsistencies in the code. This makes it easy for developers to fix the bugs as soon as the system identifies them. Automation also enhances Continuous Delivery (CD) by allowing frequent configuration of new features and other changes to the code. As a result, it is easier to keep the code in a deployable condition at all stages.

Accurate testing is crucial to software development. Automation tools run testing automatically and as frequently as the developers check the code into the repository. With automation, code testing runs throughout the software development cycles. Therefore, it is less likely for a company to release unstable or erroneous applications using automation.

We will look at these processes in the following sections of this article. All in all, automation is one of the crucial principles of DevOps. Below are some of the benefits of automation in DevOps:

  • Reliability of the app
  • Accuracy during code generation
  • Enhances collaboration between the teams
  • Automation also reduces the cost of production by reducing the number of staff needed for development, testing, and deployment
  • It improves the overall quality of the app

Continuous Integration

Continuous Integration enables developers to submit and merge their individually written/modified code into the shared main code branch. For instance, once the product road map is laid out, code generation starts. In most cases, developers will begin with the most critical parts of the source code. Therefore, Continuous Integration requires that individual developers submit and merge their code into a shared repository several times a day.

These changes go through automated testing to identify bugs. Developers will fix any detected bugs as soon as possible to keep the code ready for the main codebase. This allows for smooth workflow and consistency through regular adjustments to the code to meet the set validation criteria. Continuous Integration is also an essential step towards Continuous Delivery, a process that we shall focus on later.

Why is CI so significant in DevOps?

DevOps is based on a set of strategies that fuel software development processes for speedy and incremental release. Continuous Integration allows multiple developers to write and merge their code into a shared codebase. This builds transparency and accuracy throughout the development lifecycle. It ensures that everyone is on the same page during the code generation stage of software development. It also promotes collaboration between the involved departments in an organization.

Through automated testing, Continuous Integration enhances the quality of the end-product. Errors and bugs are identified and fixed early enough before checking in the code into the main codebase.

Continuous Integration starts with building the most critical code to layout the product roadmap. This is followed by automated tests on the code in the shared repository before merging the code in the main codebase. Everyone on the team must update their code multiple times a day. Remember that CI is all about keeping the code in a deployable state at all times. Upon testing, developers should focus on fixing any bugs in the code as soon as they are detected.

Continuous Integration takes us to the following critical principle of DevOps; Continuous Testing / Test Automation.

Continuous Testing / Test Automation

Earlier, we talked about how critical automation is as a component of DevOps. Automation starts immediately after the developers start writing the code and run throughout the development lifecycle. Continuous Testing goes side by side with Continuous Integration. It involves continuously reviewing the minor changes integrated into the codebase to enhance the quality of the product.

Continuous Testing is an excellent way to attain a frequent feedback loop on the business risks at all stages of development. First, developers merge the code, and the quality assurance team takes over through test automation. Unlike traditional software development methodologies, the QA team doesn’t wait for developers to write all their code. For many companies, test cases are actually written before the code, in what is called Test Driven-Development (TDD) and the test cases will simply fail until the code is written. In all cases, testing needs to happen as soon as the code gets to the shared repository. At this stage, bugs and errors are detected, and developers can fix them immediately.

Continuous Testing puts customer satisfaction in the mind of the developers at the idea stage of development. The quality assurance team uses Test Automation to check for software stability, performance, and security threats when developers integrate changes into the standard repository. Thus, Continuous Testing enhances the quality of the application and speeds up the development process.

An organization will need to develop a Test Automation strategy when laying down a software development roadmap. It can include unit testing, UI testing, load testing, API Integration testing, among others. Test Automation plans vary from one organization to another depending on the DevOps pipeline and selected metrics.

Continuous Testing and Test Automation bridges the gap between Continuous Integration (CI) and Continuous Delivery (CD). Upon testing and fixing detected bugs, the operations team proceeds to Continuous Delivery - A process that ensures that the code remains in a deployable state throughout the development lifecycle.

Continuous Delivery / Continuous Deployment

Continuous Delivery is the logical next step after Continuous Integration and Continuous Testing and is an integral part of almost every Calavista project. Automating the delivery mechanism massively reduces delivery issues, delivery effort, and delivery time. However, please note that even though the delivery is automated, manual release mechanisms may still be in place for moving a release from one environment to another – especially customer acceptance and production environments. Continuous Deployment automates even these delivery steps. This process goes hand in hand with Continuous Integration. Minor changes are integrated into the central code repository in a deployable state throughout the development lifecycle upon testing the code.

In other terms, Continuous Delivery (CD) is a combination of the processes we have discussed, i.e., building the code, testing, and merging the code in the main codebase in short cycles. In this step, the software is only a push-button away from deployment (release to end-user). At this phase, the team reviews the code changes and verifies/validates the code changes to ensure that the application is ready for release. When the criteria is met, the code is pushed to a production environment.

The changes incorporated in the CI/CD phase are released to the customer in the Continuous Deployment phase. The code changes go through Test Automation to check for quality and stability before releasing them to the production environment. Continuous Deployment focuses on customer or End-user satisfaction. For instance, if a user makes a bug report, developers can make changes to the code. The changes will be automatically deployed upon passing an automated test. The deployment will fail if the newly written code does not pass the test. Therefore, Continuous Deployment reduces the inaccuracy from recently applied code and thereby maintains the software's stability. Continuous Deployment is the process that enables developers to add new features or updates to a live application.

Continuous Deployment checks the code for the stability of newly integrated code changes before releasing the software to the client or end-user via Test Automation.

Continuous Monitoring

Continuous Monitoring (CM) is usually the last phase of the DevOps pipeline. However, it is just as important as any other phase. The continuous model of operations means that code changes occur rapidly. As a result, Continuous Monitoring helps the DevOps team with the proper insight into what and how their system is operating.

So how does CM work?

DevOps engineers are required to help other teams support and maintain their application. Continuous Monitoring is put into place to enable support to be proactive instead of reactive. Metrics and KPIs are introduced for Continuous Monitoring to enable more visibility and insight into how the production code is both developed and running. In addition to metrics, centralized logging is usually put into place to expedite the diagnosis of issues. These tools bring together a way to monitor all aspects of an application in support of creating a better product.

Continuous Monitoring reduces app downtime. It is because the team is aware of the app's performance as well as threats on a continuous basis. Besides, bugs are detected and fixed in real-time, thereby enhancing customer satisfaction.

The primary goal of Continuous Monitoring is to ensure maximum performance of the app. This can only be achieved through responding to customer feedback, monitoring user behavior, and deploying changes as often as needed.


I hope this blog has helped you gain a better understanding of DevOps and how we break it out at Calavista. Hopefully on your next project you can reference our Tenets and see how better you can fit DevOps into your organization. Please feel free to reach out as we always love to talk DevOps!

Calavista Joins The DesignSpark Innovation Center

Written By: Daniel Kulvicki, Solutions Director at Calavista

Big news!!! We wanted to share with everyone our amazing new Bryan - College Station location at the DesignSpark Innovation Center. The Innovation Center is the perfect place for inspiring entrepreneurs and long-standing Aggie businesses to coalesce, to further grow the startup ecosystem near the Texas A&M campus. Calavista has always loved to bridge the gap from startup idea to product and we are very excited for the attention focusing on the startup community here. The Aggie graduates at Calavista feel privileged to help the A&M community in any way possible and this is a great opportunity.

The DesignSpark Innovation Center Photo Tour

Below are some shots of outside the Innovation Center at Lake Walk in Bryan, Texas. What a beautiful building and landscape!

DesignSpark Innovation Center
Courtyard at the DesignSpark Innovation Center


Exterior of the DesignSpark Innovation Center
Photo By: The DesignSpark Innovation Center


DesignSpark Innovation Center Exterior
Photo By: The DesignSpark Innovation Center


In the image below, the building on the right is a great gym with an incredible, professional-quality basketball court. This gym is open to members of the Innovation Center and surrounding businesses. It's a great place to workout and build camaraderie within the area.


DesignSpark Innovation Center Gym
DesignSpark Innovation Center gym exterior.


Check out the Stella Hotel that is right across the street. The Stella Hotel is a great place to stay and have events near the Innovation Center. The tower over the lake offers great views to anyone who climbs the stairs. We also like to hang out at the POV Coffee House next door to the Stella Hotel.


Stella Hotel near the DesignSpark Innovation Center
View of the Stella Hotel near the DesignSpark Innovation Center.


Now to the inside. First up is the Calavista office at the  Innovation Center. We are so proud, and excited to be right in the middle of the building! We are moving in later this month and will be ready for visitors very soon!


Calavista office at the DesignSpark Innovation Center
Calavista office at The DesignSpark Innovation Center


Below is one example of the artwork in the building. There are great settings throughout the building. Each time I look at this wall, I find another interesting and inspiring quote.


Quote wall at the DesignSpark Innovation Center
Wall of quotes at The DesignSpark Innovation Center.


Here is a good look at the inside of the building. You can see the artwork wall on the left, and our office is just past the shuffleboard table that you see on the right.


Interior of the DesignSpark Innovation Center
Photo by: The DesignSpark Innovation Center


Check out the events that occur at the Innovation Center on a regular basis. Look out for a Calavista employee if you come to an event, as we love to attend these!


Events at the DesignSpark Innovation Center
Recurring events at the DesignSpark Innovation Center.


Although I could share a ton more pictures, we would love for you to come see our office in Bryan for yourself! If you are in, or around Bryan/College Station, feel free to ping me or anyone at Calavista for a meeting at The DesignSpark Innovation Center. We would love to get to know you and tell you how we can help you and your business take the drama out of software development!

Approaches for Generating Realistic Test Data

Written By: Steve Zagieboylo, Senior Architect at Calavista

In this post, I'm going to discuss some approaches to obtain realistic Test Data without compromising the security of any customer's sensitive data. In the world of health care software, this is referred to as Personal Health Information (PHI), but the concept exists in financial software, document management, ... really, everything.

Why is Realistic Test Data Important?

The data you test against must include a lot of completely unrealistic data: the corner cases that you have to make sure you handle correctly, but almost never come up in real life. However, it is equally important to have a good quantity of realistic data, justified by, “You just never know.” I can’t count the times that a special case came up in the real data that we hadn’t considered as a corner case. If your application handles large amounts of real-world data, there’s just no substitute for testing it with large amounts of real-world data.

What Aspects are Important?

  • Quantity. Having too little data will hide a lot of performance problems, such as an N+1 problem in your queries. You don’t need the individual developers’ systems to have the same quantity as a production system, but the primary QA server should.
  • Quality. One of Calavista’s customers has a very cool product that does intelligent analysis of the data produced in a hospital stay. This includes checks, for instance, of the drugs prescribed, considering the age of the patient. If the data were just randomly generated with actual drugs and actual dates, but no consideration of making them ‘realistic’ (i.e., drugs that a doctor would actually have prescribed for a patient with this profile and this diagnosis), then the intelligent analysis code will freak out. Tests created with random data would just be useless.
  • Interconnectedness. The manner and scale in which the different parts of the data model connect should be realistically represented. For example, compare a business in which Customers typically make 3 or 4 orders a year compared to one where they typically make thousands of orders a year. The approaches to both the UI and to the data storage will be quite different.
  • Approaches to Create Realistic Test Data

    There are three general approaches:

  • Generate the data by hand.Someone who is an expert in the system as well as in the specific field would have to create a realistic data set. Even with tools to support the effort, this would be a humungous undertaking and it would still be a challenge to make data "realistic," especially medical data. It would need to include Patients and Visits and Symptoms and Diagnoses that all fit correctly together and have realistic quantities.
  • Use a tool that generates the data.This is industry specific, of course, but often there are tools that can do some of this work. In the case of medical data for instance, there are some tools for generating, from whole cloth, data that adheres to the HL7 standard. There are several good choices, if all you’re looking for is legal HL7 messages, but nothing that can automatically generates a fully coherent set of data
  • Start with real data and obfuscate it. This approach is the only one we have been able to find that really solves the problem. The challenge here is to remove all sensitive information from the data, but still leave it adequately realistic. This approach is the subject of the rest of this paper.
  • Obfuscation / Data Masking

    PHI data is more than just the names and addresses of the patients. It is important that even a determined investigator would not be able to take obfuscated data and reverse the process to figure out whom the data describes. Therefore, not only names but birthdates, dates of visits, locations, even some specific combinations of symptoms might need to be obscured. However, changing the dates too much might change the analysis of the illness: Symptoms of osteoarthritis are much more alarming in a twelve year old compared to the same symptoms in someone who is seventy-three. Proper obfuscation, or data masking, creates characteristically intact, but inauthentic, replicas of personally identifiable data or other highly sensitive data in order to uphold the complexity and unique characteristics of data. In this way, tests performed on properly masked data will yield the same results as they would on the authentic dataset.

    Tool Recommendations

    SOCR Data Sifter
    On open-source tool for obfuscation. It has a setting for exactly how obfuscated the data should be, where the lowest setting is unchanged and the highest setting leaves the data unrecognizable.
    Documentation Link
    Source Code Link
    SQL Server Dynamic Data Masking (DDM)
    In SQL Server, as of SQL Server 2016, there is a Dynamic Data Masking feature that allows you to specify data masking rules on the database itself, with no need to change the data. Whether or not a particular user can see the real or the masked data is based on user privileges. Here is the best overview of the feature that I’ve found: Use Dynamic Data Masking to Obfuscate Your Sensitive Data
    Roll Your Own
    I am usually a big advocate of buy over build. The software you imagine is always bug-free and does just what you want. It isn’t until you’ve spent a bunch of time working on it that it acquires the bugs and feature compromises that make you unhappy with the existing tools. In this case, however, it is a pretty small bit of code and purely internal, so if it chokes, you haven’t embarrassed yourself in front of customers. The biggest risk is that you have to make sure that no sensitive data is getting past it, which is pretty straightforward to check. The biggest bit is to have a serious review of the entire database schema to make sure all-important fields are being modified.
    You can randomize the data just with SQL, with clever use of the RAND command. Here is a great article on this approach: Obfuscating Your SQL Server Data.
    If the code for your data model is cleanly isolated in a library, you can make a new application, pull in that library, and build your tool with that, using the same technology your developers are already comfortable with. If your code is not so cleanly isolated, a riskier approach is to write the obfuscation code inside the same code base, where it is only possible to trigger it with a completely isolated command. I’d recommend putting what amounts to a two-factor authentication to trigger this code; just think what a disaster it would be if it somehow gets built and then run inside the production system. It should only be included if a TOOLS_ONLY flag is set in the build process, and can only be run if the OBFUSCATION_TOOL configuration flag is set in the application configuration. Of course, neither flag should be on by default. Even so, carefully consider the impact that an error – or even a rogue developer – could have on a production system before using this approach.

    This extra code might need other additional controls on it, as well. For instance, if you have tables that are flagged as Insert-only, you will need to change the settings and add new code to your lowest level code to enable your obfuscation code to write the obfuscated versions of the records. Again, these functions should have two levels of enablement, such that they are not even built in the code that goes to production, and if they accidentally do get built, they won’t run because they do nothing if the OBFUSCATION_TOOL configuration flag is not set.

    These steps should be built into the DevOps tools so they can be run easily, or even are automated to run every day. For one Calavista customer, we set up a process where part of the nightly process was to copy all the production data, obfuscate it, and install it in the QA system. It was incredibly helpful in tracking down tricky bugs, because even though the names and dates were different, the IDs of the records that triggered the problems were consistent, and that was all that was saved into logs.

  • Clone the data. This is no harder than just taking a recent backup, standing up a new database instance, and restoring from backup. This MUST be done inside the controlled data space, where only users who are approved to see PHI data have access.
  • Run the obfuscation code on the alternate data set, changing it in place. There are a number of tools for this step, with some recommendations below. Alternatively, you could create these tools yourself.
  • Create a "backup" of the alternate data set, which now has only obfuscated data.  Only this backup can leave the protected zone.
  • Delete the alternate data set.
  • In the non-protected zone where the QA systems live, bring in the alternate backup and restore from backup to the database where you want to have this latest data.
  • Additional Consideration: Logging

    This is more a general thought for any system with sensitive data, but it intersects with this approach. When logging specific record information, such as when you are logging that an unusual process is being invoked for a certain record, the developers should know to be conscious of what data is sensitive, and never to put that data in the log. For most applications, it is acceptable to put naked IDs in the logs, since those do not mean anything to someone without access to the database, anyway. Fortunately, the IDs are exactly what you want to be able to hand to developers who are going to be debugging against the obfuscated QA database. They still match up to the problem records, whereas the names, addresses, etc. do not.


    When dealing with sensitive data of any sort, including health care data, it is important for QA systems to have realistic data upon which to operate. Having small amounts of hand-generated data is going to allow bugs and problems to creep in, and will make them very difficult to track down with they are found in the production system. There is really no substitute for getting the real production data and having a copy of it, properly obfuscated, in the QA environment. Calavista recommends performing this step automatically, as part of the nightly process. Even though getting this set up represented work that had to go on the schedule, it inevitably more than pays for itself in the long run.

    Challenges of Data Migration

    Written By: Steve Zagieboylo, Senior Architect at Calavista

    In my last blog, I talked about how we estimate new projects, and I included the offhand comment that Data Migration is always harder than you think it will be. The purpose of this blog is to provide a few examples of why I find this to be the case.

    What is Data Migration?

    Data Migration, in the context of web projects, is needed when you are making a major change to the product, such that the data model has significantly changed. Tools like Evolve or Liquibase aren't going to cut it, because those tools are meant for point changes to the application, with small changes to the data model that you could describe in a couple of commands. You need a migration if you are going to be switching your users over to a completely rearchitected version of the application. It’s still the same tool your customers are paying for, but with the major revision you’ve fixed those areas that are clunky, and (hopefully) simplified the abstraction to make lots of new functionality possible.

    However, you still have a ton of data that needs to be transferred to the new model. Your customer wants everything that they already appreciate about your product to remain. Of course, the most important part of that, to them, is their data. If any of their data is lost, it won’t matter how much better the new version of your application is, they will be unhappy with it. Getting the migration right, therefore, is an absolute requirement to keeping your customers happy.

    Practice, Practice, Practice!

    Specific challenges are addressed below, but the first and most important takeaway from this article is that you need to have practiced the migration step dozens of times before you do it for real. If you haven’t done several complete migrations on a true copy of production data, and successfully brought up the new system with the migrated data, then you aren’t ready to try it on production. You need to start with a staging system that is functionally identical to your production system, including versions of peripheral elements and some horizontal scaling of the pieces that scale horizontally. (You can have just three instances instead of nine, but if there are multiple instances in production, you need to test with multiple instances.) You should replicate the entire process on the staging server, all the way from shutting it down (with some active users at the time being politely booted) to switching the DNS entries so that the next users will log on to the new system.

    Testing, Testing, Testing!

    Part of the importance of doing all the practices is the testing that goes along with it. Identify what the key indicators are that represent correct data and automate the comparisons. For instance, if you have Customers who each have Orders, you should be able to get the count of Orders for each Customer, sorted by customer name on both systems, and the arrays should match exactly. It’s no good to migrate all your data if it introduces inaccuracies in the data when you do it.


    Data Model Mismatch

    There are lots of ways in which the data model can be a mismatch, and the approach to each is very different.

    1. Indirection Layer Added.Perhaps a many-to-one relationship is becoming many-to-many, or references are being bundled so they can be changed together. You'll need code to create the new layer of indirection.

    2. Data Being Collapsed. You had a hack in the old system where you made extra records because it couldn’t account for some more complex entities that your new data model can represent. There are two stumbling blocks here. First, make sure you’re not double-processing your complex records, once for each sub-part you run into. Second, make sure you compensate for the differences in record counts when you do your testing.

    3. Less-used Columns Abstracted. You may have made a number of columns in a table that are only used in certain cases, and you find yourself making more and more of these. Instead, you roll them all into a separate table of named properties.

    4. Special Cases Abstracted. Special cases, such as dependency requirements, which used to exist in the code with hard-coded references are now abstracted in the data. Not only do these need to be captured correctly, but old data that didn’t meet the dependency but was “grandfathered in” now has to be handled. (This is a very real and very painful issue that came up for us recently. Our solution was a spreadsheet import that referenced specific records by ID that had to be modified as they were transferred.)

    5. Old Data no Longer Tracked. In a complete rewrite of a client’s product, there were some old features which were no longer supported, but we did not want to lose the old data associated with them. It would only be referenced by internal users in unusual circumstances, but it shouldn’t disappear. Our solution was to dump it into a text file and keep it as ‘attached documentation,’ a field we had anyway.


    Sheer Quantity of Data

    You will need to know how long it will take to migrate the data to know how long you’ll be shutting down theData Migration Anecdote system. Of course, by the time you’re doing it for real, you’ll have practiced the whole process many times, and you’ll know exactly. But before you get there, you might want to perform an experiment or two to have an idea how long it will take. This might surprise you, and it may affect the entire approach. (See the sidebar.)

    If you find that your migration is too slow, there are some tricks you can do.

    • Bigger Transactions. If you are wrapping your rows processed each in a transaction, you are spending a lot of time starting and stopping transactions. Instead, bundle a group of rows into a single transaction. You don’t want to get too large, though, because one error means the whole transaction needs to be rolled back. A bundle of 50 or 100 will mean that the transaction overhead is small compared to the real work.
    • Multi-Process. If you are reading from one database, doing work, then writing to another, all done very serially, there are two ways you can process in parallel that are pretty easy to do. First, if your data is pretty independent, you can break it up according to the key of the source data and kick off several threads, each of which are assigned a block of the source data. Second, and this turns out to be easier than you think, is to read and process in one thread and then write in another. Either way, the goal is to make the database the bottleneck (typically the one you’re writing will max out first), rather than your processes. Be careful, because you don’t want to create too many threads, all eating up memory, when all they are doing is waiting for their chance to write to the database. You really only need two, so that one thread is reading from A while the other is writing to B and vice-versa. If there is a fair bit of processing, perhaps use three threads.


    Corrupted Data

    If you’re migrating from a system that has lived and evolved for several years, don’t assume that the data is perfect. It isn’t. Minor errors have crept in, from failed operations that weren’t properly in a transaction that would roll back; from hand-editing the SQL; from requirements changes on the data, where old records are being ignored; from who knows what. Your migration had best not choke and die on some corrupted data. Your first full pass through the data will probably die, and your second, and … That's why we practice the migration several times.


    Data Changing While Migration is Occurring

    If you do not have the luxury of shutting down the application while you are migrating the data and moving to the new system, then you must deal with the possibility of data changing after the migration has started. This is not too bad if you have immutable data that is all time-stamped, but if you had thought that hard about your data model before, would you really be migrating off of it? There is no one-size-fits-all answer here, but consider these approaches.

    • Take a backup, restore to another instance, and migrate from that. At least then you’ll have a clean snapshot to work from and a very clear timestamp.
    • Try harder to convince the bosses that access to the system can be shut down from 3 to 6 am on Sunday morning. It’s a horrible weekend for you, but it’s better than data loss.
    • If all data-changing events are done through an event bus, you can replay the feed (probably through its own translation to the new system), but if you had that architecture, I wouldn't have to tell you.
    • A poor man’s version of the last point: Add to the old system code that writes all the changes to a file at the same time it is saving it, such that you can read the file back and replay it. Combined with the first bullet point, this can give you a safe transfer.



    In addition to the coding effort for the migration, there’s going to be a big DevOps challenge for coordinating the migration and then the entire changeover from the old production system to the new one. Part of the practice has to include automating the entire process, from creating the database instances, firing up the migration code with the correct accesses to old and new databases, executing it, firing up the new application code, and switching DNS or load balancer to direct new connections to the new code. If any of these are being done manually, make sure there is a checklist, and stick to it religiously in your practices as well as, of course, in the real thing. In the final practices, don't let the DevOps person do any task that isn't on the checklist, no matter how obvious and everyday the task is.


    When you’re making a major change to an entirely new version of your web application, you are already going to have some users who metathesiophobic (afraid of change -- I had to google it.) so they are already likely to grumble. The last thing you can afford is to have data issues as well. You need to budget lots of extra time to make sure that the data migration goes smoothly, and you need to practice the whole process, end to end, several times before the D-Day when you do it on the production system. Have a well-rehearsed process governed by a checklist that is so complete anyone could follow it. And GOOD LUCK!


    Estimating Software Projects in an Agile World

    Written By: Steve Zagieboylo, Senior Architect at Calavista

    Calavista boasts an impressive 90+% on-time in-budget track record. As I pointed out in an earlier blog, I consider this SEVEN times better than the industry average of 30%. And yet, Calavista is very firmly an agile shop -- we know the folly of waterfall methodology, especially for greenfield projects. It doesn’t really work for real projects that real people care about, because a lot of the decisions you make in designing software can’t really be made without putting it in front of real users. Many people will claim that you can’t really do accurate estimations in an agile environment, because you can’t estimate if you don’t know the final target, but those people are wrong.

    Lawrence Waugh (CEO and Founder of Calavista) likes to compare the process to house construction. Let’s say you have purchased some land, and you want to build a house. You go to an experienced house builder and you say you want a 3-bedroom colonial with 2.5 bathrooms, and you want to know how much it will cost. After looking at the lot and asking you a few questions -- room sizes, dining room, quality of furnishings and finish -- he will give you a number. It will be a suspiciously round number, but once you work out all the details, it will turn out to be pretty darn close. The contractor can do this because he has done it a lot of times before, and he knows the sort of basic costs and the sorts of problems that can crop up, and he makes an allowance for a reasonable number of them. In other words, he has built houses much like the one you want to build, and he knows what the process is going to be.

    Software is similar, though there is a much wider range of targets than in house building. Since starting at Calavista, I’ve worked on a project that was done in 4 months with a team of 5 people, and I’ve worked on a project that was 18 months reaching MVP (and continued from there) and peaked at 35 people. So how do we estimate such a wide variety of projects?


    Understand the Goals and the Scope

    When we are making an estimate for a new piece of software, of course, the first step is to understand what it is supposed to accomplish. This is different from “what it is supposed to do.” The latter question leads down a dangerous path which ends in waterfall specs. We are not, yet, trying to document user flow or data models, we are just trying to capture the basics of who the users are and why they will be using the software. Who the users are might be equivalent to roles, -- it certainly should include all the roles -- but might have a few more. The “accomplishment” level of specificity is closer to “Manage the list of items in the store” rather than the specifics of adding/removing items, changing the prices, etc. This is the equivalent of learning that the house is to have 3 bedrooms or 5.

    List Major Areas of Activity

    For each role, list the things that they want to get done using the software. This is a subset of the stories you will end up with, it is just the important ones, the happy path. A complete set of stories includes all the odd cases -- e.g. users wanting to back up and start over in a process -- but I’m not going to get to that level of detail in making an estimate. I know from the dozens (hundreds?) of times that I have created user flows, that these extra stories will always be there, and my experience in creating includes those, so my estimate will, as well. This is the equivalent of learning that there will be 2.5 bathrooms, a large dining room, and a swimming pool.

    Sketch Out the Data Model

    Your mileage may vary on this step, but I continue to be a “Data First” architect -- that’s just how I think. I make a simplified Entity Relationship Diagram (ERD), not a full-blown database diagram, and I don’t imagine that it is 100% accurate, or even 60%, but it leaves me in a state of stress if I don’t at least scribble this down on paper. If I can feel the shape of the data as it fills and flows, that helps me complete the next step.

    List Out All the Screens

    Next I write down all the screens that the app will present. In modern Single Page Apps or hybrids where a lot happens on a single page, this is not as clear as it used to be, but it works just as well to think of it as a multi-page app with relatively little reuse of screen real estate on a page.. The app hasn’t actually been designed, yet, so I’m not really talking about literal screens that will be presented to a user, necessarily. It’s really just the functionality of user interaction, and these pieces might go together very differently but will still end up having all the parts I’m considering.

    A UI-centric architect might do this before the previous step, and be perfectly successful doing it; it’s a matter of preference.

    For each screen, I enter three values, Back end cost (in person-days), Front end cost (also in person-days), and a fudge factor which ranges from 2 to 10. Having lots of fields and a complicated layout increases the front end costs. Having both data to be fetched and data to be saved increases the back end cost, plus the complexity of the data needed and saved increases it. (This is where my data model helps.) I don’t get down to the individual field level, but I try to be aware of ‘tricky’ fields, such as a field that will have to include a type-ahead against data that is filtered by selections in other fields.

    The fudge factor is usually around 4, but these things increase it

    • This part of the data model is not that well understood, so I expect changes.
    • This part of the user interaction is not that well understood.
    • There’s a ‘science project’ involved, such as interaction with some outside library that I’ve never used before.
    • There is workflow. It’s always harder than you think.
    • There is more than one sub-table involved, where list management is needed at multiple levels.  For example, Customers have Orders and Orders have Items Ordered and Items have Features (size, color, quantity, etc.)
    • There is user input we can’t completely control, such as an upload of a spreadsheet for batch processing. Users have an amazing ability to mess these things up in unexpected ways.

    To continue the analogy to house building, we’ve now specified the sizes of the rooms (more or less), the size of the pool, the length of the driveway, and the quality of the appliances. We are not going to get any closer than this in the estimation level. Once we actually start the project, that’s when we’ll do the equivalent of hiring the architect and putting together the completed plans, including all the details -- closets, plumbing, vents, etc.

    Add Non-Interaction Features

    I add these in the same list, but they have a 0 for the Front End cost. These are things like emails that get sent, background tasks that perform data maintenance, batch processing tasks, data migration from an earlier version of the product, etc. Data Migration, by the way, is always way harder than you think, just go ahead and double it, AND add a large fudge factor.

    This is also the step where I think hard about the non-functional requirements, such as performance, security models, time zone issues. If any of these are outside the bounds of the normal stuff I’ve done several times, I’ll add extra tasks to cover them. I don’t mean to imply that they are necessarily small afterthoughts just because they come at the end. Many of these are pretty significant tasks, possibly even being a proportional overlay to the existing work, such as multi-tenancy, which more or less doubles all the back end costs.

    Apply the Formula

    My formula for the actual costs is:

    (base cost) * (1.25 + (fudge factor)/5)

    The result is that each task with a fudge factor of 4 gets approximately doubled, and higher or lower fudge factors are a little more or less than doubled. This is what I’ve found, just from experience, that the time I expect a task to take needs to be doubled to cover actually completing the task, with completed unit tests, responses to code reviews, and getting it merged into the Development branch. The 1.25 is a factor to account for developers spending time in meetings, as well as time lost to vacations, holidays, and sick time.

    Estimate Personnel vs. Time

    Next, we make one or more plans with people and deadlines, such as “4 back end, 3 front end for 8 months,” taking the requests of the customer into consideration. If they are looking to meet a hard deadline, then we figure out the number of front end and back end engineers we will need to get all the work done before the deadline. If they have been less specific, we might put together a few scenarios with different team sizes and different target dates.

    Add the Other Parts

    After that, we add QA, Requirements Analyst(s), UI/UX, DevOps, and the time for Calavista Solution Director and Architect who together will shepherd the project. We usually also include a “Sprint Zero” for kicking off the project, getting source control, Jira, Confluence, Communications, and the rest of the development infrastructure into place. Some of these are proportional to the amount of work, some are fixed costs.

    Make a Proposal

    Finally, we bring this all together into a proposal. We know that if we get the contract, we will have a framework and enough time to create something that will accomplish the customer’s goals. There will be changes, of course, to the different pieces: the pages, the layouts, the secondary goals and tasks, and the delivery; but with this process we have been very successful in delivering software that makes our customers successful, within the time we’ve actually predicted, which is our ultimate goal.

    Punched Code Tape

    It Was All Over After Punch Cards

    Written By: Lawrence Waugh, Founder at Calavista

    I don't feel old, but when I look around at other people in the IT industry, I realize I'm a dinosaur.

    How old a dinosaur? My first programming job (the summer after my sophomore year at MIT) was writing COBOL code to cull trends from Polaroid's customer registration "database" - really just one big flat file. I have to shake my head at the sheer number of anachronisms in that one sentence.

    The first computer I programmed, as a sophomore in High School, had 4K of RAM, and you loaded programs via punched tape. Half of that 4K was consumed by the BASIC interpreter, leaving you 2K to work with on your program. There was no swap space - there was no hard drive to swap with. So the 2K was what you had. You could be typing in a program (on a console that printed directly to a paper roll, not a CRT), and you'd actually get an OUT OF MEMORY response when your program got too big. Programs weren't stored in some representational form - they were stored as literal strings of characters and interpreted later - so code brevity really mattered. I remember shortening the output strings of a program to save a hundred bytes or so, so that the entire program could fit in memory. That sounds like a Dilbert cartoon, but it's true. Concise code became an art form for me.

    But then I graduated to some serious computing power. The town's computer was housed in my school, so in Comp Sci 2, we got to use it, and its' awesome 8K of RAM. Of course, 2K reserved for the FORTRAN interpreter, but still essentially 3x the heap and stack space to write code. It also had a Winchester Fixed Drive, but that was some magical thing that we never got to play with. We had to submit our FORTRAN programs on punch cards, laboriously typed out on a punch card machine.

    I learned a lot from that, believe it or not.

    We would have one assignment per week. The "jobs" we submitted - a deck of cards wrapped in a rubber band, placed in a cardboard box in the server room - would be executed at night, after the town's work was done. The jobs included cards for the data, so that the computer would read the program, then read the data, and then do its thing. Mr. Hafner, a tall, bespectacled man, would come back after dinner, load a deck, and press the "execute" button. The computer would suck in the cards, then print out the results. Whatever they were. He'd then take that deck, wrap it in the output (that venerable fan-fold, wide, striped paper), and put it back in the box. We'd come in the next morning and rummage through the box for our output.

    So that's exactly five attempts to get your program exactly correct, assuming you tried to run your program the same day it was assigned. More likely, you'd spend a few days writing the program before trying to execute it. So 2, or maybe 3, tries at best.

    Now imagine coming in the day before a project is due, picking up your output, and seeing:

    Syntax Error on line 18

    D'oh! A day's effort lost, with no indication of whether or not - once the stray comma on line 18 was corrected- your program would even compile, let alone run correctly. It all depended on the next night's run. Last chance.

    It didn't take many F's on assignments before you got very, very careful about your coding. Syntax errors were one thing- but algorithmic errors were another. The teacher, Mrs. Sheldon, had a set of data she'd feed your program. Running the program once or twice on trivial data wouldn't catch most of the errors. So you sat down and flowcharted things out. You thought up edge cases. You compared algorithms with your friends'. You shot holes in their ideas, and defended your own. You read, and re-read, your punched cards. You'd swap decks and read each others' work, in case your eyes might catch something your friends missed.

    In short, because we only had a few tries to get it perfect, the cost of a mistake - whether design, implementation, or syntax - was grave. And because the cost was so grave, we killed ourselves to make sure we didn't make mistakes. As a result, we did the kind of design, peer review, and QA work that most development shops today would be proud of. We were barely teenagers, writing complex code, working on antiquated equipment, writing everything from scratch. But our code almost always compiled, and ran correctly, the first time.

    When I was a senior, the high school got a CMS system, with some CRT terminals. We also got an Apple IIe. Now I could type my code in and run it, on the spot. I never touched a punch card again.

    And my coding immediately got sloppier. I started typing before I'd finished thinking. I started trying to compile before I'd finished coding. But worst, I started coding before I'd really designed. Sometimes things just wouldn't work, and I'd have to start over. But the more insidious errors crept in when my code would almost work correctly. It would work in the obvious way, but I wouldn't have spent the time to think through the edge cases, or complex numbers as input, or just the unexpected.

    Over the years, I've tried to discipline myself to not work that way. I've used PSP, enforced estimates, allocated specific time to design and peer review... but at the end of the day, my keyboard beckons. And it's hard to not want to start typing when you're excited about a project. "Why not just try this, and see if it works?"

    And when I do that, I invariably write inferior code.

    At Calavista, we have a process where the developer doesn't commit code directly. Instead, they create a regression test that demonstrates that their code works, and then tells our system that they're ready to "close" the issue. The system might check to see if another developer has diff'ed their changes in the past 24 hours. If so, it will take their new code, merge it with any recent changes to the code base, check that code into a temporary sandbox, build the product from source, create an installable, install it, and smoke-check the result.

    Then, it will run the developer's new regression tests to ensure that what they set out to do actually works. Finally, it will run every regression test ever written as part of previous closes, to make sure all of those still work. Depending on the project, it may also run performance tests, or code coverage analysis, or any one of several other tests.

    Only then, when all tests have passed- when everything has worked flawlessly- does the code get promoted into the main line (or root, or master branch, or whatever), and made available for manual testing. Which is a whole different story.

    This process has worked incredibly well for making sure the code base is stable, and functional. Basically, the code is examined so carefully before it's checked in that every commit is a release candidate.

    But when I think about it, that's pretty much what we were doing back when we had to work with punch cards. The cost of a mistake was grave- so we didn't make them. And maybe that's what we've lost in the intervening years as we've sped up and "improved" the development cycle.

    So here's to punch cards. You taught me a lot. Rest in Peace.

    Web App

    Web App Jump Start Comparison: Generated UI

    Written By: Steve Zagieboylo, Senior Architect at Calavista

    This is the third in the author's blog series comparing two "jump start" tools for making Java-based Web Applications:

    Both of these platforms create for you an application with a ton of functionality. There is a tremendous value just to starting with a completely working application, so you can actually get to working on your own code and not spending time struggling to get the boilerplate working. However, both of these start you with lots more.

    JHipster, Cuba Framework Feature Chart

    JHipster Landing Page with Admin Menu

    Java- Admin

    Cuba Platform Admin Menu

    Cuba- Admin Menu

    The default application from Cuba does not have a landing page, but jumps right into the first of the Admin pages (for the Admin user).

    JHipster Application Metrics

    JHipster Metrics

    Below are the sections that were on the page. Some of these, like Ehcache Statistics are available only if that element was selected when the original application was generated. There were more options that I did not select, and I suspect that they would show up here, as well.

    • JVM Metrics
      • Memory
      • System
      • Garbage Collection 
    • HTTP Requests
    • Ehcache Statistics
    • Datasource Statistics

    Cuba Platform Application Metrics

    This has a similar set of views.

    Cuba Metrics


    JHipster Configuration View

    This is a really helpful view if you have a number of different deployments with different configurations. Rather than having to go check the configuration settings for any particular instance, this view of the data is right there in the admin menu. In addition to the Spring Configuration, there is all the System Properties, the Environment Properties, and Application Configuration, pretty much everything that you use to manage the features of your system. The only downside to this view is that the values are not editable, here, but that would be too much to ask.

    Cuba Configuration

    Cuba Platform Dynamic Attribute Management

    From their documentation: Dynamic attributes are additional entity attributes, that can be added without changing the database schema and restarting the application. Dynamic attributes are usually used to define new entity properties at deployment or production stage.

    Cuba Attribute


    Cuba Platform Scheduled Tasks Management

    This keys off of the schedule annotations in Java, and it gives you live information and control over these tasks. Given how challenging it is to debug issues with these tasks, just having a little more control over them seems like a great thing.

    Cuba Scheduled Tasks

    JHipster Live Log Management

    This is my favorite feature of JHipster. It automatically detects all the loggers that you have created, and it lets you change their log level on the fly.

    JHipster Live Log Management

    Web App

    Web App Jump Start Comparison: Setup and Start

    Written By: Steve Zagieboylo, Senior Architect at Calavista

    This is the second in the author's blog series comparing two "jump start" tools for making Java-based Web Applications:

    To the amusement of my colleagues at Calavista, I am constantly saying how much I hate computers. I don't, of course, but what I hate are how hard they are to do anything you haven't done before. Lots of tools have an overly-complicated setup process, and there's no reason for it other than the creator of the tool not paying enough attention to the new user getting started. I've abandoned more than one tool because a couple hours in, I still can't get it working. I always figure that if their setup is so filled with bugs, then the product probably is, too, so I don't feel any loss in giving up so easily.

    Caveat: I have already used JHipster for several projects for different Calavista customers. So my setup and start was not exactly virgin, but I am presenting here as if I were. In fact, I had more troubles with this because I had more troubles with this because I had an older version already on my old system and I failed to uninstall it properly in my first try.


    Overall Winner! It's a tie.

    It is hard really to compare them. Cuba was simpler to get started, and it serves a very different purpose. It is intended for ongoing development rather than just getting started, but it offers fewer options for the final architecture. What it offers is great, if that's pretty much what you want. JHipster, on the other hand, is primarily a one-time tool with which you create your application and then you're on your own. It is incredibly powerful in what it creates, but all that power makes it a lot harder to get started, simply because there are so many choices you have to make.


    Step One: Install-- Winner! Cuba Framework

    Cuba Framework: If you already use IntelliJ, then the Cuba install couldn't be easier-- just point at a plug-in and go. Their older product had a separate IDE, which you used for everything except the Java editing, but now it is all integrated into the one plug-in. You create a new project with File/ New/ Project, as you would expect, answer a few questions, and you're ready to run. If you don't already have IntelliJ, then they have another option which I believe includes a free version of a scaled-down IntelliJ, but I haven't tried it.

    JHipster: This is also quite easy, but not quite as simple as just adding a plug-in. They have a very clear 'Getting started' page with instructions to install Node.js and npm, and then to use them to install JHipster. This is where I ran into trouble, because I had forgotten that I had installed the old one with yarn, and my system was finding that one rather than the newer version, but that's on me. Once I got it straightened out, it all worked fine.


    Step Two: Create and Run your Application -- Winner for ease of use, Cuba Framework. Winner for power, JHipster.

    The next step was to create and to run the generated application. In both cases, I had it use the PostgreSQL database that I had already running. JHipster also has an option to use an embedded H2 database for development, which would have gotten around the next hurdle, but I had it using PostgreSQL for both development and deployment.

    A not very artificial hurdle: Just to see how well they handled the misstep, I did not create the database user or schema that it was set up to use. (I remembered that I had made this mistake when I first tried out JHipster, a few years ago.)

    Cuba Framework: This could not have been easier. The menu has acquired a new top level choice 'CUBA' (which I confirmed is only shown in an actual Cuba project, not in any of my other projects). On the menu is 'Start Application Server' which I selected. When it couldn't log in to the database, it told me clearly that the database wasn't available. Once I fixed that problem, it ran perfectly, giving me a link in the Console window to launch my browser pointing at the UI.

    JHipster: This had a few hiccups, some of which are related to the additional power that is available. First, rather than just a new project in my IDE, it has a command line interface that walks me through a dozen choices (many of the same choices in Cuba's New Project dialog, such as root package name, database provider, etc.). There was a dizzying array of selections, but most had defaults that I know are good choices. These are a few of the options for which Cuba gave its one choice.

    Web App Comparison
    Spoiler Alert: Cuba wins for power in other arenas, specifically the implementation of the data model. If the UI Framework is not a deal breaker, the greatly expanded set of choices for Model implementations might cause the power-hungry to swing back. See future blogs in this series.
    Once I had gotten through the application generation, Picture1npm reported some vulnerabilities, some of which were not trivially fixed. This is a little concerning.

    After generation is complete, it finishes with instructions how to launch the server. 2I was able to point IntelliJ to the pom file and I was also able to launch the Spring Boot Application from there, but it was less obvious how to do it. (Since I already knew how, I'm not sure how much less obvious it was.)

    JHipster did not do well, however, on the missing database test. It seemed to be running and I was able to bring up the UI, but then it gave a somewhat confusing error message, saying that authentication failed. It was actually referring to the server's authentication with the database, but that wasn't clear. At first I thought that I had just forgotten the admin password.

    Application Comparison

    Both tools create an impressive application, with a ton of functionality already working (such as User and Role management, Swagger UI, CRUD UI of all the data, and lots more). However, that is the subject of the next blog in this series.


    Using Alba to Test ASP.Net Services

    Written By: Jeremy Miller, Senior Architect at Calavista

    One of our projects at Calavista right now is helping a client modernize and optimize a large .Net application, with the end goal of being everything running on .Net 5 and an order of magnitude improvement in system throughput. As part of the effort to upgrade the web services, I took on a task to replace this system's usage of IdentityServer3 with IdenityServer4, but still use the existing Marten-backed data storage for user membership information.

    Great, but there's just one problem. I've never used IdentityServer4 before and it changed somewhat between the IdentityServer3 code I was trying to reverse engineer and its current model. I ended up getting through that work just fine. A key element of doing that was using the Alba library to create a test harness so I could iterate through configuration changes quickly by rerunning tests on the new IdentityServer4 project. It didn't start out this way, but Alba is essentially a wrapper around the ASP.Net TestServer and just acts as a utility to make it easier to write automated tests around the HTTP services in your web service projects.

    I started two new .Net projects:

    1. A new web service that hosts IdentityServer4 and is configured to use user membership information from our client's existing Marten/Postgresql database.

     2. A new xUnit.Net project to hold integration tests against the new IdentityServer4 web service.

    Let's dive right into how I set up Alba and xUnit.Net as an automated test harness for our new IdentityServer4 service. If you start a new ASP.Net project with one of the built-in project templates, you'll get a Program file that's the main entry point for the application and a Startup class that has most of the system's bootstrapping configuration. The templates will generate this method that's used to configure the IHostBuilder for the application:

    For more information on what role of the IHostBuilder is within your application, see .NET Generic Host in ASP.NET Core.

    That's important, because that gives us the ability to stand up the application exactly as it's configured in an automated test harness. Switching to the new xUnit.Net test project, referenced my new web service project that will host IdentityServer4. Because spinning up your ASP.Net system can be relatively expensive, I only want to do that once and share the IHost between tests. That's a perfect usage for xUnit.Net's shared context support.

    First, I make what will be the shared test fixture context class for the integration tests shown below:

    The Alba SystemUnderTest wrapper is responsible for building the actual IHost object for your system, and does so using the in memory TestServer in place of Kestrel.

    Just as a convenience, I like to create a base class for integration tests I tend to call Integration Context.

    We're using Lamar as the underlying IoC container in this application, and I wanted to use Lamar-specific IoC diagnostics in the tests, so I expose the main Lamar container off the base class as just a convenience.

    To finally turn to the tests, the very first thing to try with IdentityServer4 was just to hit the descriptive discovery endpoint just to see if the application was bootstrapping correctly and IdentityServer4 was functional at all. I started a new test class with this declaration:

    Screen Shot 2021-05-11 at 11.20.45 AM

    And then a new test just to exercise the discovery endpoint:

    Screen Shot 2021-05-11 at 12.18.50 PM

    The test above is pretty crude. All it does is try to hit the /.well-known/openid-configuration url in the application and see that it returns a 200 OK HTTP status code.

    I tend to run tests while Im coding by using keyboard shortcuts. Most IDEs support some kind of "re-run the last test" keyboard shortcut. Using that, my preferred workflow is to run the test once, then assuming that the test is failing the first time, work in a tight cycle of making changes and constantly re-running the test(s). This turned out to be invaluable as it took me a couple iterations of code changes to correctly re-create the old IdentityServer3 configuration into the new IdentityServer4 configuration.

    Moving on to doing a simple authentication, I wrote a test like this one to exercise the system with known credentials:

    Now, this test took me several iterations to work through until I found exactly the right way to configure IdentityServer4 and adjusted our custom Marten backing identity store (IResource-OwnerPasswordValidator and IProfileService in IdentityServer4 world) until the tests pass. I found it extremely valuable to be able to debug right into the failing tests as I worked, and even needed to take advantage of JetBrains Rider's capability to debug through external code to understand how IdentityServer4 itself worked. I'm sure that I was able to get through this work much faster by iterating through tests as opposed to just trying to run the application and driving it through something like Postman or through the connected user interface.