The unit in unit-testing

The interpretation of “unit” in unit testing to mean a class is wrong. This and an unfortunate misunderstanding of Kent Beck’s “run unit tests in isolation” still drive developers to test their classes in extreme isolation with consequences that are irrationally attributed to test-driven development (TDD) as not being viable.

When we exaggerate decoupling in our tests, we rely on mocks and stubs to stand for collaborators that are needed by the classes being tested. As the size of our test suite grows, so do the number and usage of mocks and stubs. But, there are two major problems with mocks and stubs. First, when we use them in tests, we reveal the implementation details of the classes that depend on them. This is because we write expectations and verifications based on a very detailed knowledge of how the mocks and stubs are used. Second, unlike real objects that change organically in response to changes in their collaborations, mocks and stubs are resistant to change due to the very specific ways that they are set up for test cases.

The first issue is usually a compromise that developers accept in exchange for testability. Also, many consider it to be harmless because encapsulation is not lost and usage of the classes is unaffected.

On the other hand, the second issue is what hurts developers most and drives many to abandon TDD. The scale of the problem manifests itself once the test suite has reached a considerable size. At that point, any slight change in the nature of the collaboration between classes requires changes to numerous tests that use mocks and stubs to fake the collaboration, with countless expectations and verifications suddenly needing revision. Since the task is usually tedious, and mocks and stubs do not have actual value in production, there is little motivation to spend resources on maintaining the tests. Eventually, the practice of TDD disappears altogether.

Ideally, mocks and stubs would only be used for testing integration between components instead of collaboration between classes, but if their use cannot be avoided, their effect can at least be minimised. The most obvious way to do this is to lower the number of expectations that are set up in the test suite. By grouping tests according to the mocks and stubs that they use and the expectations that are needed, respectively, the number of test cases that need to be changed when collaborations change will be less.

There are other ways to make TDD more efficient, such as good discipline in the design of test cases (which I will write about at some point), but one improvement that can be applied immediately is to reduce the use of mocks and stubs. And the way to do that is to embrace real object collaboration within your tests.

Learning BASE64 encoding

I used to search the web whenever I needed to do BASE64 encoding in my code, but when today I had to do it again, I thought it would be beneficial in the long run to learn the algorithm. It turned out to be not too difficult.

The point of BASE64 is to communicate binary data as text, using only characters that are likely to exist on most computer platforms. These safe characters are known as the BASE64 alphabet and are the letters A to Z and a to z, the numerals 0 to 9, and the characters / and +. There are other ways to represent bytes as text; for example, by converting them to hexadecimal strings made up of the characters 0 to 9 and A to F. But, doing so means that for every character in the original set, two hexadecimal characters are required, which doubles the size of the data.

The BASE64 alphabet consists of 64 characters, each one associated with an integer value. For example, the character A is represented by 0, the character Z by 25, and character / by 63. This means that to cover the range of integers from 0 to 63, the BASE64 word size must be six bits. As a consequence of this, during BASE64 encoding the original data must be laid out and padded to make its size in bits divisible by six.

The smallest number of bytes (or 8-bit words) that can be re-arranged in groups of 6-bit words is three (3 × 8 bits = 24 bits, which is divisible by six). This means that data must be batched in triplets of bytes, and each triplet must be converted into four 6-bit words. The BASE64 character matching the value of each 6-bit word is then output as an 8-bit ASCII character. So, for every three bytes of data, four bytes of output are generated, giving an inflation factor of 4:3 (which is a better compromise than the 2:1 ratio from hexadecimal encoding).

Data that cannot be split exactly in groups of three bytes must be padded to make them so. For example, data that are one byte long must be padded with two zero-value bytes, and data that are 11 bytes long must be padded with one zero-value byte. In other words, data must be padded to reach a size that is divisible by three.

With the theory out of the way, here is how BASE64 is implemented in Java, using the example “any carnal pleasure”.

First, encode the string as a series of bytes.

This results in an array of 19 bytes.

Next, pad the array with two zero-value bytes to make its size divisible by three.

Then, convert each triplet of bytes into four 6-bit words and calculate the value of each. (Use bit shift operators.) Append the BASE64 character represented by each 6-bit value to a StringBuilder instance.

This yields the BASE64 string “YW55IGNhcm5hbCBwbGVhc3VyZQAA”.

Finally, replace the padding characters (“AA” in this example resulting from the two zero-value bytes) with as many “=” characters. The “=” is used in the BASE64 decoding process (which is not covered in this post) to determine the amount of padding that has been applied.

This gives the final result “YW55IGNhcm5hbCBwbGVhc3VyZQ==”.

Now, I know that there are at least two classes in the standard Java libraries that provide BASE64 operations. One of those is undocumented and is subject to change, and the other is meant to be used by the mail library, which could cause confusion (or would be bad form?) if they are referenced in code that does not otherwise depend on the libraries where the classes reside. By writing my own implementation, I can avoid these unnecessary dependencies, and most importantly, I can do BASE64 in any language that does not have a built-in function for it.

How we use SQL Server Data Tools

This post describes the process that we use to develop databases with SQL Server Data Tools (SSDT) in Visual Studio.

Conventions

For this process to work, the conventions below must be followed.

  • Use the live database as the gold standard for schema objects (and data).
  • Deploy only database projects that have been built successfully.
  • Deploy to a database that matches the schema of the live database.

At the beginning of a development iteration

  1. Restore a copy of the live database onto the development computer.
  2. Synchronise database project schema objects with the schema objects in the restored database.
  3. Remove pre-deployment and post-deployment scripts from the database project.
  4. Update the database project version number.
  5. Build the database project.
  6. If the build fails, fix the errors and rebuild.
  7. If the build completes, check in the changes.

During a development iteration

  1. Make changes to script files in the database project.
  2. If the changes might result in data loss, write pre-deployment and post-deployment scripts to migrate the data.
  3. Build the database project.
  4. If the build fails, fix the errors and rebuild.
  5. If the build succeeds, publish the changes onto the database on the development computer and test.

Interim releases to the test environment

  1. Restore a copy of the live database from backup.
  2. Build the database project.
  3. Publish the database project onto the test server.

Deployment to the live environment

  1. Back up the live database.
  2. Build the database project.
  3. Publish the database onto the live server.

What next?

Look what I’ve just found: my old blog dedicated to programming.

Moved from coding.mu to the present location, security-compromised, then retired altogether because it was too difficult to remove the attacker’s rootkit, it was recently recovered by my good sys-admin friend when he migrated websites to new servers.

It is a walk down memory lane for me. The first post was published on Movable Type in August 2003 when WordPress was unheard of, and most of the remaining posts are about Java or PHP. A lot has changed since then: I don’t do as much PHP coding anymore, and I no longer rave about Apple products although I’ve upgraded from my Nokia 6630 to an Apple iPhone. Some things remain the same, though: I still code a lot in Java, and NetBeans is still my favourite IDE.

I have not yet decided what to do with this blog, as I consider that there are already enough blogs dedicated to programming. In the meantime, it will stay online and — hopefully — still be useful to some readers.

Choosing the Best Tool for the Job

Kathy Sierra wrote an interesting entry on her Creating Passionate Users blog about how the “right tool” is not always the “best tool” for the job. According to Sierra, one factor that is not to be neglected is the level of enthusiasm for using the tool, which can sometimes be even more important than the perceived appropriateness of it. Of the three main considerations (appropriateness, expertise and enthusiasm), expertise remains essential nonetheless.

I mostly agree with Sierra’s opinion that the urge to learn a tool often creates enthusiasm and drives productivity upward. However, this holds true only if the users are capable and learn efficiently. If that is not the case, there is a risk that projects get delayed as users struggle with the tool. In most cases, seasoned developers can apply their past experiences to quickly become efficient with new tools. And, driven by enthusiasm, they become highly productive.

Technorati Tags: , , , ,

Refactoring by Renaming in Visual Studio .NET 2005

In Visual Studio 2005, the Rename refactoring operation is very straightforward, as described below.

  1. Highlight the identifier that needs to be renamed.
  2. Type the new identifier over the selection.
  3. When the IDE displays a caret underlining the first part of the identifier, press Alt+Shift+F10. The IDE will display a drop-down list of applicable refactoring operations.
  4. Select the Rename operation from the drop-down list.

Technorati Tags: , , , , ,

Parallels on MacBook Pro

I am very pleased with my decision to run Windows XP in Parallels instead of using a different partition for BootCamp. Not only have I avoided the hassle of having to reboot each time I need to switch between Mac OS X and Windows XP, I also get near-native performance in Windows XP.

I installed Visual Studio .NET 2005 on a virtual machine configured with 8 GB disk space and 512 MB RAM, expecting the performance to degrade when the hefty development environment is running. I was pleasantly surprised when I was able to use it without incurring any performance loss.

Technorati Tags: , , , ,

Windows XP on the MacBook Pro

I am running Windows XP within a Parallels on my MacBook Pro. All my software work flawlessly and as fast as on a native installation, which I can only attribute to the excellent virtualisation technology in the Core Duo processor.

I considered installing Windows XP on a separate partition and using BootCamp to boot into it, but dropped the idea when I discovered Parallels. I was so impressed that I did not have to wait for the end of the trial period to purchase a licence. Parallels is one of these must-have software for the Mac.

Technorati Tags: , , ,