Saturday 17 November 2012

Success with the new design! And RemoteSessionHandle

It's still early days, there's still a lot to polish, but it connects and disconnects successfully.

On this post, I'll introduce RemoteSessionHandle, the class responsible for implementing RAII on the SSH session remote "resources", i.e., the SSH session on the remote host.

Once again, the interface is simple:

class RemoteSessionHandle
{
public:
    RemoteSessionHandle(boost::function<void(std::string)>& pReportStatus,  
        boost::asio::ip::tcp::socket& pSock, SessionHandle& pSessionHandle);
    ~RemoteSessionHandle();
 

    void CloseSession();
 

    // WE MAY ADD MOVE IN THE FUTURE, BUT NOT COPY
    RemoteSessionHandle(RemoteSessionHandle&&) = delete;
    RemoteSessionHandle& operator=(RemoteSessionHandle&&) = delete;
    RemoteSessionHandle(RemoteSessionHandle const&) = delete;
    RemoteSessionHandle& operator=(RemoteSessionHandle const&) = delete;
private:
    utils::TaskState DoHandshake();
    utils::TaskState DoCloseSession();
 

    utils::CleanupState cleanupState;
    boost::function<void(std::string)>& reportStatus;
    boost::asio::ip::tcp::socket& sock;
    SessionHandle& sessionHandle;
};


Ctor, dtor, and a method to explicitly close the session on the remote host. The private methods are the handlers for the loop functions.

We're storing references to the report callback (to let the UI know what we're doing), to the socket (used in the ctor to create the session), and to an instance of SessionHandle (because the libssh2 calls need the LIBSSH2_SESSION* stored in SessionHandle).

RemoteSessionHandle::RemoteSessionHandle( 
    boost::function<void(std::string)>& pReportStatus,
    boost::asio::ip::tcp::socket& pSock, SessionHandle& pSessionHandle) :
    cleanupState(utils::CleanupState::NoCleanup), 
    reportStatus(pReportStatus), sock(pSock), sessionHandle(pSessionHandle)
{
    if (!reportStatus.empty())
        reportStatus("SSH Handshake");
 

    io_service& ios = sock.get_io_service();
    ios.reset();
    sock.async_read_some(boost::asio::null_buffers(), bind(AsyncLoopSocket,
        protect(bind(&RemoteSessionHandle::DoHandshake, this)), 
        boost::ref(sock), true));
    ios.run();
}


The ctor contains the SSH handshake. We get the io_service from the socket, and use it to drive our loop.

RemoteSessionHandle::~RemoteSessionHandle()
{
    if (!reportStatus.empty())
        reportStatus("Cleanup: SSH remote session.");
 

    // WE'LL ONLY TRY TO CLEANUP IF NO ATTEMPT HAS BEEN MADE YET
    if (cleanupState == utils::CleanupState::NoCleanup)
    {
        try
        {
            CloseSession();
        }
        catch (...)
        { }
    }
}


The dtor's behaviour is similar to SessionHandle's dtor. We only perform the cleanup if no attempt has been previously made.

void RemoteSessionHandle::CloseSession()
{
    io_service ios;
    deadline_timer dt(ios);
 

    dt.expires_from_now(milliseconds(10));
    dt.async_wait(bind(AsyncLoopTimer, 
        protect(bind(&RemoteSessionHandle::DoCloseSession, this)), 
        boost::ref(dt), 10));
 

    cleanupState = utils::CleanupState::CleanupInProgress;
    ios.run();
 

    cleanupState = utils::CleanupState::CleanupDone;
}


CloseSession() sets up the loop with a timer, and manages the cleanup state.

utils::TaskState RemoteSessionHandle::DoHandshake()
{

    int rc = libssh2_session_handshake(sessionHandle.GetSession(),  
        sock.native_handle());
 

    if (rc == LIBSSH2_ERROR_EAGAIN)
    {
        return utils::TASK_WORKING;
    }
 

    if (rc)
    {
        BOOST_THROW_EXCEPTION(SSHHandshakeError() <<
            ssh_error_string("Error " + lexical_cast<string>(rc) +  
            " during SSH Handshake.") << ssh_error_id(rc));
    }
 

    return utils::TASK_DONE;
}
 

utils::TaskState RemoteSessionHandle::DoCloseSession()
{
    int rc = libssh2_session_disconnect(sessionHandle.GetSession(),  
        "Disconnect SSH session");
 

    if (rc == LIBSSH2_ERROR_EAGAIN)
    {
        return utils::TASK_WORKING;
    }
 

    if (rc)
    {
        BOOST_THROW_EXCEPTION(SSHDisconnectSessionError() <<
            ssh_error_string("Error " + lexical_cast<string>(rc) +  
            " disconnecting SSH session.") << ssh_error_id(rc));
    }
 

    return utils::TASK_DONE;

}


DoHandshake() and DoCloseSession() follow the same pattern. There's not much to add here.

This post is very short on words, mostly just stating the obvious. I believe that's a good thing - it means the class is simple.

The code is here.

Monday 12 November 2012

SSH Session - Class SessionHandle

And... I'm back. Work has gone back to manageable levels, and I've taken a few days off C++, to give my head a chance to catch up.

I also took the opportunity to get some feedback on my design. It was quite satisfying to hear that I've been heading in the right direction all this time. And, once again, while writing (and rewriting) that question, I've hit yet another design change, one that I believe will simplify my work when I introduce other forms of authentication (for now, it's just login/password).

So, let's get moving.

Today, I'll introduce SessionHandle, the class responsible for implementing RAII on the local SSH session resources.

The interface is pretty simple:

class SessionHandle
{
public:
    SessionHandle();
    ~SessionHandle();
 

    void CloseSession();
    LIBSSH2_SESSION* GetSession() { return session; }
 


    // WE MAY ADD MOVE IN THE FUTURE, BUT NOT COPY
    SessionHandle(SessionHandle&&) = delete;
    SessionHandle& operator=(SessionHandle&&) = delete;
    SessionHandle(SessionHandle const&) = delete;
    SessionHandle& operator=(SessionHandle const&) = delete;
 


private:    
    utils::TaskState DoCloseSession();
 


    utils::CleanupState cleanupState;
    LIBSSH2_SESSION *session;
};

We define a ctor/dtor. All other special operations are deleted. I suppose I may have to implement move semantics, but I won't worry about that now. CloseSession() allows the client to explicitly release the resources, in which case the dtor won't do anything. GetSession() is necessary for the other classes that need the LIBSSH2_SESSION*, although I may review this and make these classes friend, thus removing the need to expose it to the outside world.

DoCloseSession() is the handler called by the loop functions, and cleanupState is a flag to tells us if we need to cleanup on the dtor.

The code itself is also simple, which is good, since this was one of the goals of this redesign.

SessionHandle::SessionHandle() : 
    cleanupState(utils::CleanupState::NoCleanup), 
    session(nullptr)
{
    session = libssh2_session_init();
 

    if (!session)
    {
        BOOST_THROW_EXCEPTION(SSHCreateSessionError() << 
            ssh_error_string("Error creating SSH session."));
    }
 

    libssh2_session_set_blocking(session, 0);
}


There's no need for a loop here, because creating the session is always a blocking operation. Speaking of which, for now blocking on the session is hard-coded. I may make it configurable in a future version.

SessionHandle::~SessionHandle()
{
    // WE'LL ONLY TRY TO CLEANUP 
    // IF NO ATTEMPT HAS BEEN MADE YET
    if ((session != nullptr) && 
        (cleanupState == utils::CleanupState::NoCleanup))
    {
        try
        {
            CloseSession();
        }
        catch (...)
        { }
    }
}
 

The dtor calls CloseSession() to perform the cleanup, but only if no attempt has been previously made; meaning, if the user didn't call CloseSession() explicitly. If such an attempt was made, we won't try it again (even if it failed).

Also on the to-do list is applying boost's current_exception() to the catch block.

void SessionHandle::CloseSession()
{
    io_service ios;
    deadline_timer dt(ios);
 

    dt.expires_from_now(milliseconds(10));
    dt.async_wait(bind(AsyncLoopTimer, 
        protect(bind(&SessionHandle::DoCloseSession, this)), 
        boost::ref(dt), 10));
 

    cleanupState = utils::CleanupState::CleanupInProgress;
    ios.run();
 

    session = nullptr;
    cleanupState = utils::CleanupState::CleanupDone;
}


The use of a timer and a local io_service here avoids the whole outstanding handlers issue. Once cleanup is completed, we set the state accordingly, to make sure our dtor won't try to cleanup again.

utils::TaskState SessionHandle::DoCloseSession()
{
    int rc = libssh2_session_free(session);
 

    if (rc == LIBSSH2_ERROR_EAGAIN)
    {
        return utils::TASK_WORKING;
    }
 

    if (rc)
    {
        BOOST_THROW_EXCEPTION(SSHFreeSessionError() <<
            ssh_error_string("Error " + lexical_cast<string>(rc) + 
            " freeing SSH session"));
    }
 

    return utils::TASK_DONE;
}


Finally, the handler follows the pattern we've setup a couple of months ago. So, nothing new here.

Incidentally, I said in that post that "This is equivalent to libssh2's ssh2_exec example, and I'll be discussing it here in the following posts". I haven't discussed the example here, as such, because as I started writing the posts about it (and actually thinking about its design), I became aware of its failings, and that's when the redesign began. However, the functionality is still identical, it's only the organization that changed.

This class is the first result of this redesign. It's simple, it's focused. And I suppose I'll learn at some point in the future it's not entirely right. But that's a good thing. Otherwise, how would we learn?

You can get all the code here, namely:
  • exception.h, which contains the base exception definitions.
  • sshexception.h, which contains the exception definitions for the ssh classes.
  • misc.h, which contains the enum with our cleanup state.
  • sessionhandle.h and sessionhandle.cpp. The class we've presented above.
And you'll also need this.

Yes, I know. I need to setup a place somewhere to store this code.

Tuesday 30 October 2012

Exception and Stack Trace

I'm using boost::exception. I like the usage pattern they suggest in their docs, I like the concept of having exception types as semantic tags to which I "append" arbitrary error info. In short, I find it brilliant.

However, I've hit a little problem - stack trace. I'd like something similar to Java's and C#'s stack trace, but I'm not ready to dive into backtrace or RtlCaptureStackBackTrace. So, I've decided to create a poor man's stack trace. On the downside, it'll trace only my own code; on the upside, I hope it will be less problematic than a real stack trace (this opinion is not based on actual experience, but rather from what I've found online about it - e.g., this).

I didn't actually consider many alternatives for this one - I needed a structure of an abritrary size that could hold BOOST_CURRENT_FUNCTION, __FILE__ and __LINE__. I've toyed with the idea of placing an std::vector in my exceptions. But I quickly realized that wasn't the way to go.

Why? Suppose we've got a function, LowLevelFunction() that throws a low_level_exception, which has a "stack trace" vector (STV) where we place the error location. This is caught further up the chain by HighLevelFunction(), which adds some more error info, adds its own location to the STV, and rethrows to its caller, HigherLevelFunction(). Now, HigherLevelFunction() was called by HighestLevelFunction(), which doesn't really care about low_level_exceptions; it won't touch anything other than a high_level_exception, except to send it directly to a logger (however, it does want the information contained in low_level_exception). So, HigherLevelFunction() throws a high_level_exception, which has its own STV. Now what do we do? Copy low_level_exception STV entries to high_level_exception? Add HigherLevelFunction()'s location to low_level_exception's STV? Use both STVs? There's no good solution, because the idea isn't sound.

However, as we said, HighestLevelFunction() wants all the relevant info contained in low_level_exception. And all this info must end up on a high_level_exception, since that's what it expects to catch. And the best solution I've found for this is nested exceptions, which is business-as-usual in Java and C#. And, thankfully, also in boost::exception.

We start with this typedef (from here):
typedef boost::error_info<struct tag_nested_exception, boost::exception_ptr>
    nested_exception.
And that means we can now do something like this:
void LowLevelFunction()
{
...
    BOOST_THROW_EXCEPTION(low_level_exception() << error_id(id) << 
        error_message(msg));
}

void HighLevelFunction()
{
...
    catch (low_level_exception& lle)
    {
        BOOST_THROW_EXCEPTION(low_level_exception() << 
            resource_name(res_name) << 
            nested_exception(current_exception()));
    }
}

void HigherLevelFunction()
{
...
    catch (low_level_exception& lle)
    {
        BOOST_THROW_EXCEPTION(high_level_exception() << 
            operation_value(oper_val) << 
            nested_exception(current_exception()));
    }
}

void HighestLevelFunction()
{
...
    catch (high_level_exception& lle)
    {
        // Finally, do something useful with all this info.
        // Hope the logger's still working...
    }
}
So, on the downside, we create two instances of low_level_exception, where we could've used just one. I can live with that, until evidence shows me it's a bad idea.

Yes, I know, there will be times when this may only make matters worse - e.g., if we catch an std::bad_alloc because we ran out of memory, creating exceptions is not the way to go. However, in a situation like this, I'd say our system is most likely headed for a fresh new (re)start, and there's little we can do to change its course.

Next on the list: Take a look at the implementation of diagnostic_information(). I like the completeness of its output, but not so much its formatting, especially when dealing with nested exceptions.

On a side note, I've loved finding out about current_exception(). Finally, I can do something useful here:
SomeImportantClass::~SomeImportantClass()
{
    try
    {
        // Cleanup. May throw
    }
    catch (...)
    {
        // I'd like to log this, but what exactly did we catch?
        // current_exception() to the rescue
    }
}
Oh, and I've been having some fun with perl, regex, and log files. I'll be diving into Boost.Regex soon, so I'm getting used to it in a more "dynamic" environment.

Friday 19 October 2012

The light at the end of the tunnel - Design is jelling

Here is a first version of the AsyncLoop functions. As I've said before, this functionality is mostly intact from our first working example.

What I particularly like about this solution is that we're separating SSH details and loop control. Our loop only cares about each iteration's task state - has it finished, or is it still going? This also means we don't need an object to drive these loops, free functions are enough.

The functions have different names because of this. I prefer having two different names than having to worry about casts every time I want to use these functions.

This new design has also revealed a problem that had escaped my attention. I've been using mostly AsyncLoopSocket() as my preferred loop driver. And some operations require starting both a read and a write on the socket. And that, in turn, gives rise to handlers being called after the work is done, i.e., after we call sock.get_io_service().stop(). Since this doesn't actually clear io_service's queue, outstanding handlers will be called on this io_service's next reset()/run() cycle.

And this becomes a problem when we have the life cycle spread out across several classes. Let's take a look at, say, SessionConnection. As I explained in my previous post, this class handles the SSH session connection to a remote host (handshake/authentication and disconnect). We would like to use the socket as our driver for the disconnect loop. However, this loop runs on the dtor, and requires both a read and a write.

So, it's possible we'll have one or more events from this loop still on the queue after SessionConnection's dtor has finished execution. That means there's a risk of our io_service's next reset()/run() cycle calling a method for an instance of SessionConnection whose dtor has finished execution. "There's a risk" means "it probably won't happen most of the time, and when it does, it'll create some mysterious hard-to-debug problem". Which I'd rather avoid.

Even though I already had an alternative, I took a look at asio's source (despite my current semi-illiteracy, when it comes to templates). The queue is cleared on io_service::~io_service(). This deletes a pointer to service_registry, so I took a look at its destructor:

service_registry::~service_registry()
{
  // Shutdown all services. This must be done in a separate loop before the
  // services are destroyed since the destructors of user-defined handler
  // objects may try to access other service objects.
  boost::asio::io_service::service* service = first_service_;
  while (service)
  {
    service->shutdown_service();
    service = service->next_;
  }

  // Destroy all services.
  while (first_service_)
  {
    boost::asio::io_service::service* next_service = first_service_->next_;
    destroy(first_service_);
    first_service_ = next_service;
  }
}

The first comment seems to hint that shutdown_service() doesn't clear outstanding handlers in the queue, and that this is done in destroy(), which deletes first_service_, a service * it receives as parameter. So, for a socket (and considering IOCP), we'ge got this to look for, as a service type: typedef detail::win_iocp_socket_service<Protocol> service_impl_type.

I digged deeper into asio's source, and soon I was looking at code like this, on win_iocp_socket_service_base.ipp:

if (FARPROC cancel_io_ex_ptr = ::GetProcAddress(
        ::GetModuleHandleA("KERNEL32"), "CancelIoEx"))
  {
    // The version of Windows supports cancellation from any thread.
    typedef BOOL (WINAPI* cancel_io_ex_t)(HANDLE, LPOVERLAPPED);
    cancel_io_ex_t cancel_io_ex = (cancel_io_ex_t)cancel_io_ex_ptr;
    socket_type sock = impl.socket_;
    HANDLE sock_as_handle = reinterpret_cast<HANDLE>(sock);
    etc...

And that's when I've decided to stop. Yes, maybe there is a right way to do what I want buried somewhere in asio, but this is not where I want to invest my time right now. One of the reasons for using asio is precisely to keep some distance from OS details. I have an alternative, and I'll try it first: I'll use AsyncLoopTimer() for all operations on the dtors, since this doesn't fire two events.

I've also completed my first draft for SSHCommand's design:
  • SSHCommand will contain an instance of ChannelLocal (further work is required on these names, obviously), that does nothing on ctor and frees LIBSSH2_CHANNEL's local resources on dtor, calling libssh2_session_free().
  • The method that implements command execution will have a block with a local instance of ChannelRemote, that opens an SSH channel on ctor, calling libssh2_channel_open_session(), and closes it on dtor, calling libssh2_channel_close(). In this block we'll execute the command, via libssh2_channel_exec(), and read its result, via libssh2_channel_read(). When we exit this block, i.e., after the channel is closed, we'll call libssh2_channel_get_exit_status() and libssh2_channel_get_exit_signal(). I expect it'll be something like this:
SSHCommand::Execute()
{
    {
        // ChannelLocal WILL NEED THE LIBSSH2 CHANNEL
        // TO CALL libssh2_session_free() ON DTOR,
        ChannelRemote cr(channelLocal);
        // AND WE'LL NEED IT, AFTER cr IS DESTROYED,
        // TO GET THE EXIT STATUS/SIGNAL
        channel = cr.GetChannel();
        
        ExecuteCommand();
        GetResult();
    }
    GetExitInfo();
}

SSHCommand's design is more complex because the life cycle involved is not quite symmetric. We've got open vs. close and free, and we can only get the exit status/signal after close. So, even though I've considered (again) placing all functionality on its ctor/dtor, I've decided against it, mainly to keep all dtors simple.

A final note: I believe io_service needs more in terms of queue management. We should be able to clear the queue, to query how many waiting events it contains, to set a limit for its size. I can even see scenarios where we could traverse all its elements, performing some action. If this couldn't be implemented without a performance penalty, then it could be designed as a compilation optional, i.e., it's not active by default, but if you really need it, just activate it and rebuild.