Friday 22 April 2016

Design - avoiding the obvious

Suppose you have a process that combs a database and generates monthly reports. The rule is simple, you run at the beginning of the month, and your universe is defined by the first and last days of the previous month.

See? Simple. You've got a begin date and an end date, and just grab everything that falls within that time period.

Since this date arrangement is a recurring requirement, I've decided it was time to abstract it into a script-lib, i.e., something other processes can call on the command line, and that can also be included as a library/module by other scripts.

And so I went about defining the interface, based on this use case - OK, we'll need a function that returns the begin date, and a function that returns the end date, and a rule for calculating the required interval. In my defense, I did kill the "and we could include formatting" thought as soon as it appeared.

Why all these functions? Well, SRP, and composability and all that jazz.

After a few attempts, I scrapped the idea. I wasn't happy with the interface, and I had no alternative design. I went on to do something else, and as I was looking at another problem, the solution hit me: The key is the first day of the current month. Everything else should flow from there.

This solution is in ruby (Ruby 1.8.7, no ActiveSupport), and it goes something like this:

def get_first_day_of_month(ref_date)
    day = ref_date.day()

    if day > 1
        ref_date -= day - 1
    end

    return ref_date
end


Now, with this, it's trivial to get the begin and end dates I actually need:

first_day = get_first_day_of_month(Date.today())


begin_date = first_day << 1 # subtract 1 month


end_date = first_day - 1 # subtract 1 day

Going back to my script-lib, I can now create an interface that can both return an array/hash with the two dates (usable as a module by other Ruby scripts), and output the dates to file/stdout, separated by a delimiter specified by the caller (usable by any script). Yes, it's a modest lib, but we all gotta start somewhere.

One thing I don't understand is the lack of modifier operators on date/time frameworks (see Java's Calendar class and Free Pascal's dateutils Recode* functions for two examples on how to do this right). It would've been a lot simpler to do something like this:

first_day = Date.today().day!(1)

Yes, I know, Ruby has open classes, I can add it myself. But just because I can, doesn't mean I should.

Thursday 18 February 2016

Explicit template instantiation - Exhibit 1

As I promised last time, let's put our design to work on some code.

Back in 2013, I wrote a couple of template functions as a quick solution to a very specific problem - outputting non-ASCII characters on a Windows console in a program compiled with Mingw. Here's the header, and here's the source.

I've now applied the idea from the last post, and changed the code from full inclusion to explicit instantiation. There are other changes to be made, but those will wait.

If you go back in the header's history, you'll see that the previous version used full inclusion. As such, this was included in every translation unit (TU) that used this code:

#include "boost/locale.hpp"
#include "windows.h"
#include <cassert>
#include <locale>
#include <sstream>
#include <string>

Not necessarily outrageous. However, all of this was included because of a couple of functions. Not only that, but because the implementation was in the header, some helper functions had to be declared in the header, too, thus leaking implementation details.

Now, applying our new-fangled design, here's what we get.

Header (win_console_out.h)

Contains just the declaration for the functions.

template <typename CharT>
std::string ConvertOutput(CharT const* s);

template <typename CharT>
std::string ConvertOutput(std::basic_string<CharT> const& s);

template <typename CharT = DefaultCharT, typename T = void>
std::string ConvertOutput(T const& t);

Because the only #include we need is <string>, everything else has been removed from the header.

Source (win_console_out.cpp)

Just as before, contains the non-template helper functions. These are not part of the interface, and were removed from the header.

Implementation (win_console_out.ipp)

Most of the previous content of the header ended up here - all the #includes, the declarations for the helper functions, and the template functions definitions.

Explicit instantiation

Finally, each user of this code will supply its own explicit instantiation. In this case, you can see here what we're defining:

template std::string ConvertOutput(char const* s);
template std::string ConvertOutput(

    std::basic_string<char> const& s);
template std::string ConvertOutput(wchar_t const* s);
template std::string ConvertOutput(

    std::basic_string<wchar_t> const& s);

And that's it. I'll use my extensive (hah!) body of published code as a test for this design. If it works, I plan to keep using it, in order to reveal its ugly warts.

I've also created the mechanism for reverting to full inclusion via #defined variables, either globally or on a header-by-header basis. As I usually say, I believe users should have the right to choose, and I strive to keep to that principle.

Sunday 14 February 2016

Explicit template instantiation as an organization tool

When you create a class template, the easiest way to organize your source code is to splat everything in a header file and be done with it. Users of your class just #include your header, and Bob's an annoying (in an endearing way) relative you get to see once a year.

Of course, the easiest way has a couple of drawbacks.
  1. Your header's code will get compiled for each translation unit (TU) that #includes it, even for similar types, i.e., if it has already been compiled before. During linking, all those equal copies (instantiations for the same types) will be discarded.
  2. Since your header contains the implementation, it's chock-full of... implementation details. Among other things, this means it probably #includes other headers. See #1.
This means your compilation time for class templates will put on some weight. Wouldn't it be nice if we could put it on a diet?

I've been taking a shot a this issue for the last few days, with the following goals:
  • The header should contain only declarations.
  • The implementation should go on a separate file.
  • The class template user should have as little work, and as many options available, as possible.
So, we'll begin with the header file. This file contains the class template declaration, and this is the file that will be #included by all the code that uses our class template.

Note: I'm leaving out the include guards, but they're necessary, same as in any other header.

listener.h
template <typename T>
class Listener
{
public:
    Listener();
    ~Listener();
private:
    T i;
};


We place the implementation in another file.

listener.ipp
#include "listener.h"
#include <iostream>

#include <string>

void print(std::string s)
{
    std::cout << s << '\n';
}

template <typename T>
Listener<T>::Listener()
{
    print("Listener:ctor");
}

template <typename T>
Listener<T>::~Listener()
{
    print("Listener:dtor");
}


Note we don't have to #include <iostream> or <string> in the header.

However, if we tried to build our program like this, it would fail, because the TU containing the header would not see the implementation. Which means we'll get missing symbols at link-time.

So, here we get to the first task for the class template user - creating an artificial TU to generate the code he needs.

listener_session.cpp
#include "listener.ipp"
#include "session.h"

template class Listener<Session>;


This explicit instantiation of the class template will generate all the template's members, which will allow the linker to find them and resolve the undefined references to those symbols.

Simple, heh?

Well, keeping the template declaration and definition in separate files is all fine and dandy, but if it were that simple, everyone would be doing it. There is a trade-off, and, as is usually the case, 80% of the time it’s probably not necessary; in fact, I suspect the old 80/20 rule may even be more skewed in this case.

Which begs the question – why am I spending time with this? Because that’s how I learn.

So, what’s the trade-off here? In order for this to work you must create explicit instantiations for all the types you will be using. So, instead of just #include-ing the header and defining your Listener<Whatever> variables, your project must have one or more TUs that #include both the header and the implementation and then contain explicit instantiations for the types you’ll be using, just as shown with listener_session.cpp, above.

And this means the projects that stand to benefit more from this practice are the ones where it requires the most work, namely, large projects.

The class template author can’t predict all the types that will parametrize the template, so he can’t supply the explicit instantiations (although sometimes he does, more about that in a minute). So, this means it’s up to the class template user to create the TUs with the required explicit instantiations. I happen to think the trade-off is worth it, but I’ve never been involved in a huge C++ project, so I could be wrong.

Has it been a minute, already? OK, let’s go to the case where the class template author organizes his code like this and supplies the explicit instantiations – when he wants to limit the types available to instantiate the template. By doing this, and withholding the source code, the class author guarantees that users of his template can’t instantiate it with any type other than those supplied.

So, we get the first two goals, but so far the third goal doesn't look good. There's not much we can do, within the language, to reduce the work required from our user. But we can give him more options.

Suppose our user doesn't actually care about all this. Maybe he's just developing a hello_template_world, or he loves coffee, and doesn't mind a 10 min. coffee-break every 30 mins.

In that case, we have several options.

Just include the .ipp file.
That's it, nothing more needs to be done.

Create a header the includes the .ipp file.
This allows the author some liberty with the naming of the .ipp file, if necessary. Other than that, it's just the same as including the .ipp file.

Quite simple, heh?

Yes, indeed. And quite wrong, too. We won't have much problem with the class template members, but that void print(std::string s) will send us straight into duplicate-symbol-land (BTW, this is actually the only reason why it exists).

So, how can we go about it? There's a simple solution (really), and it doesn't require that much extra work.

listener.h
template <typename T>
class Listener
{
public:
    Listener();
    ~Listener();
private:
    T i;
};

#if defined(LISTENER_FULL_HEADER) || defined(ALL_FULL_HEADER)
#define LISTENER_INLINE inline
#include "listener.ipp"
#else
#define LISTENER_INLINE

#endif

We give our users two options, they can either #define LISTENER_FULL_HEADER, which will only apply full inclusion to this class template; or they can #define ALL_FULL_HEADER, which will apply full inclusion to every class template.

And our implementation becomes this:

listener.ipp
#include "listener.h"
#include <iostream>

#include <string>

LISTENER_INLINE
void print(std::string s)
{
    std::cout << s << '\n';
}

template <typename T>
LISTENER_INLINE
Listener<T>::Listener()
{
    print("Listener:ctor");
}

template <typename T>

LISTENER_INLINE
Listener<T>::~Listener()
{
    print("Listener:dtor");
}


#undef LISTENER_INLINE

The #undef at the end is just for preprocessor hygiene.

How do we know this actually works as advertised? Here’s what we get when we build the program with each option. The code was built with VS 2015, and we used dumpbin /symbols on the resulting .obj files.

The results are edited, to fit on a single line.

Full inclusion gives us this:

ear.obj
1B0 SECT65 notype External Listener?3?5dtor?6?$AA@
1B4 SECT66 notype External Listener?3?5ctor?6?$AA@


main.obj
1C9 SECT6B notype External Listener?3?5dtor?6?$AA@
1CD SECT6C notype External Listener?3?5ctor?6?$AA@


The symbols are defined on the two object files, one of these will be dropped when linking.

Separate implementation gives us this:

ear.obj
020 UNDEF notype () External Listener<Session>::Listener<Session>(void))
021 UNDEF notype () External Listener<Session>::~Listener<Session>(void))


main.obj
075 UNDEF notype () External Listener<Session>::Listener<Session>(void))
076 UNDEF notype () External Listener<Session>::~Listener<Session>(void))


listener_instant.obj
199 SECT60 notype External Listener?3?5dtor?6?$AA@
19D SECT61 notype External Listener?3?5ctor?6?$AA@


We get the undefined symbols on ear.obj and main.obj, which will be resolved during linking.

So far, so good. Now, time to move this out of Proof-of-Concept-Land.

Monday 8 February 2016

This behavior is by design

Visual customization (or personalization, or whatever else you want to call the ability to configure software to suit your specific needs) is a big thing for me. Especially for software that I use for long periods of time. And, for this kind of software, there's nothing more important than changing the color scheme. Sure, changing the font can be nice, changing the font size is definitely important, but changing the color scheme is, as far as I'm concerned, essential.

For me, there's nothing worse than staring at a glaring white background for hours. I've got nothing against all you 0xFFFFFF lovers out there, it's just not my thing.

These last few days, I've been comparing several visual customization alternatives, from the user's point of view. Why from the user's point of view? Because I consider that to be the most important point of view, we create software for users.

Also, when selecting the criteria for classification, there was one aspect that out-weighted the others by several orders of magnitude: Time. It's not just about changing the look of software, but doing it quickly.

As a final note, when I talk about visual customization, I mean the ability to change just about every visual aspect of the interface. There is a name for software that has a "customization" option that only allows you to change the color of the title bar or the toolbar, but I won't say it here, there may be children reading this (I hear it's great for insomnia).

So, here's the final result, from best to worst:

1. The software provides official alternatives to the default look

"Official" here means "created or curated by the software developer", and easily accessible/installable via some sort of manager that takes care of download/installation. To qualify, these alternatives must be diverse (not just variations on glaring white) and must actually work (e.g., no blue text on a black background - yes, Linux shell, I'm looking at you).

Oh, and "curated" means someone actually looked at the whole thing and confirmed that we're not getting the blue-on-black stroke of genius. Being free of bugs/exploits/nasty surprises is a must, but it's not enough.

For software that makes this grade, I don't really care that much about how difficult it may be to change individual aspects of the provided looks/themes/whatever, because we have a coherent whole that works.

Out of the software I use, the winners are Visual Studio and Qt Creator. Congratulations to these teams, top quality work. Android Studio follows right behind, and the only reason it's not up there with those two is because does have some hard-to-read combinations, where the text and background colors are very similar.

2. The software is easy to configure

Since we're not guaranteed to have a coherent alternative here, it must be easy to either change the whole look (e.g., via theming) or individual aspects of it.

So, here we may have software that has a "wealth of community-provided looks/themes/whatever", where trying out a theme is trivial, but changing each individual aspect is not - e.g., Chrome/Firefox extensions.

And we also have software that may or may not have all that wealth, but has a trivial way of either changing the whole look or individual aspects of it - e.g., Notepad++ or gedit.

Changing individual aspects may not give the user a complete coherent workspace, but it will provide a good starting point and, since it's easy to configure, it will allow the user to quickly solve problems as they appear. Again, a very time-efficient solution. Not as good as #1, but a positive experience, all things considered.

3. The software is not easy to configure

Here, it's irrelevant if we're talking about the whole look or individual aspects of it.

This is the software that expects the user to manually download some sort of file (usually, an archive file), copy it to some directory and expand it; or to copy some existing files from a system-protected directory to a user directory and then edit those, looking for some more-or-less cryptic configuration key and changing some usually-even-more-or-less cryptic value; and then, maybe, having to restart the software. Bonus (negative) points if it's a system-wide configuration, where "restart" actually means "reboot".

Most Linux desktops I've tried fall in this category. In order to change a simple visual aspect, if you can't find a recipe for changing the exact item you want, you're in for a good reading of Gtk, or Qt, or some-other-widget-toolkit docs (assuming what you want is properly documented), followed by some file copying, key-searching, and value tweaking. And, since what you'll get will usually be a good starting point, you'll have the enormous pleasure of repeating the process as further problems appear.

Oh, and if you do find the recipe you're looking for, check its date, and make sure the toolkit/software version match yours.

Here, I usually do just enough work to splash a dark background on the software I use the most and ignore everything else. I definitely don't want to waste more time than absolutely necessary.

4. The abomination

This is a special category. You may have already got that impression from the heading, but it's also special in that it has only one entry: Windows Aero.

Let's make this clear - I often look at design options and think "This is incredibly stupid, but maybe there is some logic that I'm missing here". As the man sang, "I'm not the sharpest tool in the shed", so there's definitely room for error on my part. However, when we get to Windows Aero, I can't get past the comma in that sentence. I've considered it several times, and I can see no logic here.

Let's look at the symptoms:
  • Countless posts asking what should be a very simple question, "How do I change the background color for Windows Explorer?", and getting one of two answers: Either "Switch from Aero to Basic/Classic/Anything-Else-That's-Not-Aero" or "Why would you want to do that, there's plenty of people that use white and love it". Both these answers actually mean "You can't".

  • Of course, that's not entirely true. You can. You just have to reverse-engineer the binary format that stores the color configuration (basically it's a DLL containing resources). Or pay for a utility created by someone who went through that effort; then, you'll still have to create your "visual style", and you'll still have to patch Windows to allow you to use your non-signed "visual style".

  • Yes, you read that correctly. Binary format? Check! DLL? Check! Patch Windows? Check! Options such as the background color for applications are stored (although "buried", "concealed", or even "encrypted" would be more suited here) in an undocumented binary format, in a DLL, stored in a protected system folder.

This is the only example I've found where your best option is to actually pay for a third-party application to do something as simple as changing a background color. BTW, this is also why you find successful commercial alternatives to Windows Explorer. It's not just the extra functionality these alternatives have (and they do add value to their offerings), it's also this brain-dead design by the Aero team, that means that something as simple as "the ability to change the background color" is a value point.

Here, I don't even waste my time. I'm just grateful that whoever designed Windows Aero didn't get to unleash its remarkable genius anywhere near Visual Studio.

Wednesday 6 January 2016

Logging Abstraction Layer - Take 2

As I did on the previous post, I'll begin by saying I won't go into another "My, how long it has been" post. I do, however, hope I'm not too late to express my Best Wishes for 2016.

Now, then...

I'm having another iteration on my idea for an abstraction layer for logging libraries, which is basically fancy-speak for “a collection of (hopefully, organized) macros” that will allow me to replace logging libraries with minimal effort.

I've been going back and forth between Boost Log and Poco Logger. I like both, even though I tend to prefer Poco Logger, because I haven't yet found a satisfactory solution for the issue with Boost Log's preference for truncating an existing log file when the application is restarted.

I've hit some maintenance problems with my “collection of macros”, so I decided to give Boost Preprocessor another go. Back when I began working on this idea, I looked into Boost PP, but got caught in a blind spot, probably a faulty reaction to all the metaprogramming examples. I had the feeling I could use it, but I wasn't able to make it work.

So, I rolled my own and have since been banging my head against several walls, thus gaining the required experience to make a number of questions I didn't even know were there. Which gave me a different perspective when I approached Boost Preprocessor again, and things went much better this time.

I quickly set up a PoC, and then I hit the issue I'm looking at now – How to format the log record. This is only somewhat important for readability, but it's crucial for automation. Without a structured format, we'll have a hard time creating tools to operate on the logs.

My initial design, which I haven't changed, treats the logging statement as a sequence of values to output. E.g.,

SOMEPREFIX_DEBUG(object_id, discount_id, “Getting object price”);

But a structured format requires other artifacts – at a minimum, we need delimiters. And an obvious requirement for some sort of intelligent automatism is that the nature of the values must be identified, i.e., we need to know if an element logged as an integer is, say, a total or an ID; so, we need to decorate values with names.

Boost Log gives us attributes, values which are added to the log record without the need to output them explicitly on the logging statements. Furthermore, attributes can be used in filtering and formatting, and can be scoped. I suppose there may be even performance benefits in the use of attributes, but that's something I'd have to measure.

At first, I thought this could be the way to go. Define attributes, create formatters or override the output operator, and control the way the record is written to file.

However…

Poco Logger does not have this concept. And while I'm not considering any other logging library at the moment, that may change in the future. And, as I said above, my main goal is to “replace logging libraries with minimal effort”. So, if this is my main goal, it is important enough to warrant an under-utilization of the library.

Naturally, if I don't use the library's capabilities to output this data, I'll need to output it on the logging statements. This means that formatting will also have to happen there. And since I'm already using the preprocessor as an abstraction mechanism, I'll just… use it some more.

So, I already have something like this:

SOMEPREFIX_DEBUG(object_id, discount_id, “Getting object price”);

What I need is something along these lines (simplified):

SOMEPREFIX_DEBUG
(
    SOMEPREFIX_FORMAT(“ObjectID”, object_id),
    SOMEPREFIX_FORMAT(“DiscountID”, discount_id), 
    “Getting object price”
);

Which could, then, output something like this (line breaks for readability):

2016-01-01 00:00:00 <DEBUG> 
    { "ObjectID": "42", “DiscountID”: “24” } Getting object price

Then, one day, if we decided life had become too easy, we could switch formats to something like this, without changing the logging statements:

2016-01-01 00:00:00 [DEBUG]
    <LogRecord><ObjectID>42</ObjectID>
    <DiscountID>24</DiscountID></LogRecord>
    Getting object price

So, the next step will be identifying the required formatting elements, and how to incorporate them in the logging statements. The goal here is to keep the logging statements as simple as possible.

On this iteration, I will leave out any sort of container/compound data type. Not only will these make the design a lot more complex, but I'm prepared to do without them – in my experience, I have found very few scenarios requiring the logging of these types, and it has always been possible to find a workaround somewhere between acceptable and undesirable.

Tuesday 13 October 2015

Visual Studio 2015, ICU, and error LNK2005

I'll begin by saying that I'm just going to ignore the fact that I haven't written anything in nearly nine months.

So...

While building ICU 56.1 with VS 2015, I was greeted with thousands of errors like this (also described here by someone who came across the same problem):

error LNK2005: "public: static bool const
std::numeric_limits<unsigned short>::is_signed"
(?is_signed@?$numeric_limits@...@std@@2_NB) already defined in
ParagraphLayout.obj

This is defined in <limits>, in a statement like this:

_STCONS(bool, is_signed, false);

Looking at the pre-processor output, we can see its actual definition:

static constexpr bool is_signed = (bool)(false);

If I understood the Standard correctly, this should be OK, and there should be no duplicate symbols during linking. So, I was still missing a logical cause for this.

The usual internet search for «ICU LNK2005» didn't bring anything useful, except for the link above.

Then, as I concentrated my search on LNK2005, I came across this post. The same mysterious behaviour, but now there was a plausible explanation, in a comment by MS's Stephan T. Lavavej, in a quoted post from an MSDN blog:

We recommend against using /Za, which is best thought of as "enable extra conformance and extra compiler bugs", because it activates rarely-used and rarely-tested codepaths. I stopped testing the STL with /Za years ago, when it broke perfectly conformant code like vector<unique_ptr<T>>.  
That compiler bug was later fixed, but I haven't found the time to go re-enable that /Za test coverage. Implementing missing features and fixing bugs affecting all users has been higher priority than supporting this discouraged and rarely-used compiler option.

So, after removing /Za from all projects in ICU's allinone VS Solution (Project Properties -> Configuration Properties -> C/C++ -> Language -> Disable Language Exceptions -> No), I was able to build it with no errors, on all configurations (x86/x64, debug/release).

Apparently, it's one of those rare cases where the error is actually in the compiler, not in the code.

Saturday 31 January 2015

CA Certificates - The tale of the invisible certificate

I've been through a memory upgrade on my 5-year old PC. My goal is to set up a few VMs running simultaneously, because I need to widen my scope for experimentation. I found out my BIOS has an incompatibility with the memory DIMMs currently available, but fortunately a friend lent me 8GB, so I can start working now, while I try to sort out this mess.

As I set up each VM, I'm importing my bookmarks, so that I have my net environment available "everywhere". And I've come across a curious situation, regarding certificates.

One of the URLs I have on my bookmarks is https://www.ddo.com/forums. The first time I accessed it on Firefox, I got an error message:
Peer's certificate has an invalid signature. (Error code: sec_error_bad_signature)

Using openssl s_client, I checked that ddo.com sends only its own certificate, not the chain, so I looked up the chain in IE, and checked the intermediate CA on Firefox's certificate store. It was there, but it was a different certificate - different signature, different validity (both valid, because the validities on both certificates overlapped), different issuer, only the subject was the same.

I exported that CA certificate from IE, and ran openssl verify using each CA certificate; using the one from Firefox certificate store, I got an error; using the site's CA certificate, the validation succeeded.

So, I imported the site's CA certificate to Firefox, accessed the site, and all was well again.

Then, I checked Firefox's certificate store. And I only found the exact same certificate that was there already, and which wasn't previously validating ddo.com's certificate. Except that now it was.

And much scratching of head ensued.

Until yesterday, when discussing this at lunch with a friend, he told me the obvious: "Well, if you imported it, and the site's certificate is now correctly validated, then it must be there, even if you can't see it". And that gave me a memory jolt, to an issue I had a little more a year ago, with Sun One's web server certificate store, where we had two certificates for the same CA, but only one was visible on the web console. In order to correctly see both, I had to use certutil on the command line.

And in this case, the solutions was the same:

certutil -L -d sql:path to Firefox profile directory -n certificate subject

Which promptly listed the two certificates.

And another Mystery of the Universe was solved during a meal.

I don't understand why the GUI shows just one certificate. I'm not going to say it's stupid because it may be a reasonable decision, based on knowledge I don't have. But to completely hide the fact that a CA has two simultaneously valid certificates on the store is terribly misleading, it's definitely not what I'd call a good solution.

In the end, it was command line to the rescue... as usual.