Interview with Julian Egelstaff, Technical Architect at Freeform Solutions
I recently had a chance to catch up with Julian Egelstaff, Technical Architect at Freeform Solutions. Julian attended last year’s SQL Server JumpIn! Camp where he worked with Microsoft engineers and other participants to add SQL Server support to Formulize, a PHP application that lets you to quickly create forms and reports in your website. Julian is the co-founder of Freeform Solutions, has 10 years of PHP development experience, and is a Zend Certified Engineer.
Besides simply catching up, my aim in talking with Julian was two-fold:
- In general, I wanted to better understand how he overcame the challenges that come with extending an application to support multiple databases, and…
- Specifically, I wanted to better understand how he overcame the challenges he faced in adding SQL Server support to an application.
I like to think that hindsight is 20-20, so with the passing of a few months since JumpIn! Camp, I thought this would be a good time to reflect on those challenges. Fortunately, Julian was willing to share his experience…
Brian: First, I know you are very busy, so I want to say thanks for taking the time to talk with me: Thanks! :-) Maybe you could start by telling me a little about your development background?
Julian: Sure. I am, I think, a fairly typical PHP developer - if the typical PHP developer is not a programmer by training. PHP has this reputation for being easy to learn, and there are lots of people using PHP who don't necessarily have computer backgrounds, and with my bachelor of journalism degree, I'm squarely in that camp.
However, I think I'm fairly atypical in that along the way, I went and co-founded a technology startup called Freeform Solutions, and did most of the development work in the organization for the first several years. I got my ZCE certification along the way. Just to make things more atypical, Freeform Solutions was not a regular startup. We're a not-for-profit organization, and we're focused primarily on helping other not-for-profits use technology to meet their missions.
Brian: One of the solutions you offer to non-profits is Formulize. And, it’s the application you worked on at JumpIn! Camp. Can you tell me more about Formulize?
Julian: Formulize grew out of the early work at Freeform Solutions. There were two major influences that shaped it.
First, there was a common need among many organizations we worked with for some simple data management tools… forms and reports, standard stuff that everyone needs. Except the not-for-profits we worked with, they didn't have the resources to have IT people constantly tweaking and building on some reporting framework for them. They needed to be able to add new forms themselves, and have easy access to the information that people put into the forms, with no ongoing support by anyone.
So Formulize is primarily a solution to that kind of problem. It's kind of like a CMS for forms and data, instead of for "content". A non-technical user can create a form, describe through the GUI how it fits into the organization's workflow patterns, how it's related to data in other forms, etc. And then they can also make up basic, and not-so-basic, reporting screens and other interface screens so people can interact with the form and its data.
The second influence was our need, my need, to be able to reuse everything we did as much as possible. So from version 2 onwards, Formulize was architected in a very generalized way. It can be used in a CMS-like way by non-technical users, but under the hood, it's architected more like a framework.
Now, everybody says they're a framework these days, especially in the CMS world because everyone is trying to cover all the bases. But I think Formulize is unique in this respect. All other systems that I'm aware of, that come from a CMS tradition, might be framework-ish, and let you write code for that system in a framework-ish way. But your code is still stuck inside that system. You can't install your Drupal modules inside Wordpress, and vice versa.
Formulize can be plugged into Drupal and Wordpress, and any other PHP system because of its very generalized architecture. It makes no assumptions about where it is being called from. After JumpIn! Camp, I spent about an hour poking around in TangoCMS (one of the other projects that was at the camp) and an hour later I had a working proof-of-concept Formulize application inside TangoCMS. That's what I think makes Formulize really interesting from a code point of view.
Brian: That is interesting…and sounds like it could be fodder for another blog post. :-) It’s interesting that Formulize has a framework-like architecture – I’m sure that made the work of adding multi-database different than it would be for lots of other applications. Before getting into those details, can you tell us why you wanted to add multi-database support to Formulize (specifically SQL Server support)?
Julian: Well at the most basic level, it's just plain good open source practice to be as compatible as possible with anything and everything.
But at a more pragmatic level, I knew there were a lot of assumptions in the code related to MySQL. The opportunity to see how far we could get with MSSQL was a good way to learn where the roadblocks are and what we'd have to do to get around them. That's useful for MSSQL, and any other database that we want to add support for.
Also, we've been talking with the Web Platform Installer people at Microsoft about adding Formulize to the applications it can install, and it would of course be nice to have support for the SQL Server for users who are on a fully Microsoft platform.
Brian: When it came to adding multi-database support, my understanding is that you made a choice early on to not use PDO or some other database abstraction layer. Is that correct? Can you elaborate?
Julian: Well, originally, that’s correct. As with a lot of open source projects I think, we didn't decide anything, we just ran with what we had. Formulize started life as a fork of another code base (long story) and basically we were just following the lead from that. The fork we started with only had MySQL support. Since MySQL was always available in the environments we run in, this was never an issue. So, when accessing MySQL, we use a class that is built around the procedural php_mysql driver. Also, Formulize predates PDO, so that's another factor. But it's not really PDO/not-PDO that's the main issue here. PDO just provides connection, real DB abstraction is the more serious issue.
But, as I followed the path from non-technical PHP developer to a been-around-the-block-a-few-times PHP developer, it always seemed incorrect to me that we were so tied to one particular data store. But, like with much open source software, unless someone using it has an actual need, new features don't usually get added. Everyone we know of was happy enough to do zero work and stick with the existing database option, i.e. MySQL. With that said, there are some users out there who are using Formulize in heterogeneous environments where there are many different databases, and apps doing different jobs throughout their whole IT system. I know there are some people who are using IIS and probably have SQL Server in their organization. Giving them the option to use Formulize just made good sense for everybody.
All that said, the idea of a database abstraction layer is appealing, so when we added SQL Server support, we wrote a class that uses the PDO_SQLSRV driver. We would still like to convert our MySQL access code to PDO, but that hasn’t happened yet.
Brian: So when you sat down to add SQL Server support, what was your plan of attack?
Julian: Well, I figured that most of the basic issues that we'd hit first would be syntactical things. Just differences between how your write SQL for MySQL vs. SQL Server. So I wanted to see just how bad some of that would be, what kinds of differences are we talking about.
I'm a strong, strong believer in doing as little as necessary in the code when making changes. This comes from the heritage of trying to reuse everything as much as possible, and never having enough time and budgets to do the typical "this-version-rewritten-from-scratch" kind of thing that you see in some open source projects.
So my goal was to find what was the smallest set of changes we could make that would therefore likely have the least impact on the existing code base, and end up providing SQL Server support. If the existing SQL queries were mostly OK, then that would be a big step in that direction.
Brian: How did you identify SQL and server-specific differences (between MySQL and MSSQL) when adding MSSQL support?
Julian: Trial and error was the biggie. :-) But there are other areas where we knew, a priori, that there were roadblocks, like the encryption option.
You can choose to encrypt data in Formulize, at the database level, if you really, really want to secure it against that kind of attack. We use the MySQL AES encryption function, we embed it in the SQL statements. So architecturally, that's going to be a problem for abstracting support for other databases.
What we found worked for about 90% of the features in Formulize, was simply string manipulation of the SQL just prior to sending it to the DB. There's a bunch of predictable syntactical differences between MySQL and MSSQL, such as backticks versus square brackets, SHOW COLUMNS needs to change to a SELECT statement on the information schema, the names of field types are systematically different, ie: char=>nchar, and on and on.
Brian: Can you elaborate on how you resolved the encryption issue?
Julian: Well, that is one area where there isn't yet a committed fix. The encryption feature is something of an "edge case", it's not widely used, as far as I'm aware, so it was pretty low on the list of things to tackle. From what I've read, I think the EncryptByKey and DecryptByKey functions may do the same kind of thing in SQL Server as the AES_ENCRYPT and AES_DECRYPT functions do in MySQL. But this is something I would want to get more info about from the SQL Server devs if possible. Nothing is ever simple.
Brian: I’ll see if I can get you in touch with some folks that could help with that. In the meantime, can you provide an example of how string manipulation solved the predictable syntactical differences you mentioned?
Julian: Well, there's really simple examples, like this:
$sql = str_replace("NOW(), ", "SYSDATETIME(), ", $sql);
NOW is a MySQL function, SYSDATETIME is simply the equivalent function in SQL Server. But there's also more advanced kinds of stuff, like when we're getting info about the fields in a table. In MySQL, you use something called SHOW COLUMNS but in SQL Server, you have to query the "information_schema". So we do some more advanced analysis:
if(substr($sql, 0, 12)=="SHOW COLUMNS") {
$fromPos = strpos($sql, "FROM");
$tableName = substr($sql, $fromPos+5);
$sql = "SELECT column_name
FROM information_schema.columns
WHERE table_name='" . trim($tableName) . "'";
if($likePos = strpos($tableName, "LIKE")) {
$likeName = substr($tableName, $likePos+5);
$sql = substr($sql, 0, 69+$likePos);
$sql .= "' AND column_name = ".trim($likeName);
}
}
So that basically rewrites a SHOW COLUMNS statement from MySQL, into a completely new SQL Server query. And just pulls out the necessary parameters from the original statement, and sticks them into the SQL Server version in the right place. All just using simple string functions in PHP. This is a pretty fast operation in PHP, it's just string manipulation in memory, so it adds very little overhead to your queries.
Something important to emphasize though, is that this is not an exhaustive replacement for SHOW COLUMNS in MySQL. It's simply an effective replacement for how we're using it in Formulize. For example, we’re not handling the situation where SHOW COLUMNS has a WHERE clause.
So we could get away with simple solutions like this that covered our use cases just fine. It does set up a maintenance issue though. If we add a SHOW COLUMNS statement with a WHERE clause, then we have to do more work here. So moving eventually to real DB abstraction would be preferable from that point of view. But this code works here works today, and it's a very minimal intervention, and took very little time to create.
Brian: I know that you faced a challenge that is common when adding SQL Server support to a MySQL application: dealing with the MySQL LIMIT clause. How did you get around the LIMIT clause in your SQL Server implementation?
Julian: In true open source fashion, we borrowed from another project. :-)
The good folks at Moodle had already integrated MSSQL support into their app, and they had written a function that took some SQL and two parameters for offset and limit, and returned to you a valid MSSQL statement that would mimic the behavior of a LIMIT clause. It uses the TOP syntax.
So basically, we just had to do some string analysis on our statements prior to execution, and if there was a LIMIT statement, then we remove it, grab the offset and limit values, and then basically just do one line of code:
$sql = $this->limit_to_top_n($sql, $offset, $limit);
Brian: Nice! That’s the most elegant solution I’ve seen for this problem. :-) It looks like there were some issues with that Moodle function, but they have been resolved: http://tracker.moodle.org/browse/MDL-25321.
I’m sure you ran into some API discrepancies (php_mysql vs. php_pdo_sqlsrv). Were you able to address these as elegantly?
Julian: Well, not so much discrepancies, as idiosyncrasies that we had to be aware of, or double check how they worked.
One was returning the id of the last record that was inserted into the database. This is a pretty standard thing in lots of open source web apps that use MySQL. You do an insert statement and you get back the ID of the record you just inserted. You're probably going to turn around and use that ID as a foreign key in some other statements.
In MSSQL, the official way to deal with these kinds of situations, I think, is that you're probably going to package up a bunch of statements and do them as a transaction, and you've got the database engine handling the foreign key situation for you. But lots of open source apps that use MySQL are written to be compatible with the MyISAM storage engine, which doesn't support transactions.
So for reasons like that, in a lot of web apps you end up handling many things at the PHP layer, which you might otherwise do in the database if you had a different database engine.
The PDO SQLSRV driver did provide the tools we needed, so we could get the necessary visibility in PHP about what the DB was doing. But we had to verify, for instance, that the lastInsertId method was bound to our database connection. I think you checked that one for me actually. Thanks. :-)
And I think lastInsertId is only available in PDO, not the regular version of the SQLSRV PHP driver. Is that right?
Brian: That’s correct. If you are using the procedural SQLSRV API, there isn’t a lastInsertId function. To achieve the equivalent functionality, you would have to use the SQL Server SCOPE_IDENTITY() function. You could write a function to do this, something like…
function lastInsertId($conn){
$stmt = sqlsrv_query($conn, "SELECT SCOPE_IDENTITY() AS id");
$row = sqlsrv_fetch_array($stmt);
return $row['id'];
}
This would allow you to then do something like this (with a table that has an auto-incremented id column):
// Execute an INSERT.
$sql = "INSERT INTO Table_4 (data)
VALUES ('some_data')";
$stmt = sqlsrv_query($conn, $sql);
// Call custom function to get ID.
echo lastInsertId($conn);
But, since you are using the PDO driver, you don’t need that. Any other API differences that you had to address?
Julian: Yes. We had to use a "scrollable cursor" in order for MSSQL to be able to tell us the number of rows in a result. That's something else which we were relying on knowing in PHP at certain points. We are setting the scrollable cursor it as part of the call to the prepare method, prior to executing the query. I believe that was the only place where we could force this behavior, I don't think the connection options with the PDO driver supported this.
Brian: Right. In the SQLSRV API (as opposed to the PDO_SQLSRV API) you set a scrollable cursor in your connection options array when connecting to the server.
So, there were several challenges you had to tackle, but none of them seemed in surmountable. In retrospect, was going with “string manipulation” (as opposed to using an abstraction layer) a good choice?
Julian: Well, we knew what our inputs would be, ie: we didn't have to write a complete SQL parser to take any arbitrary MySQL statement and spit out a MSSQL equivalent. So our problem space was pretty well constrained, and therefore, for most of our DB interactions, this simple string "translation" approach worked fine.
It's the other lower level stuff, like the encryption, and certain kinds of math and other stuff beyond the basics that we backed off and thought a more structural approach will be required.
For example, we make extensive use of subqueries in some parts of the code. It gets kind of complicated to use string analysis to tell what is going on in your query when there are SELECT statements inside SELECT statements. I mean, is it a good idea to write a SQL parser in PHP? That's the job of the database engine, no? So when your string analysis has to get that advanced and you're not just doing glorified search-and-replace anymore, then maybe your effort is better spent on introducing some other layer that crafts the query from scratch in a way that is compatible with your database.
But, in the case of Formulize, string manipulation was a good choice - it lets us "release early, release often." It fully works, but it doesn't cover 100% of the problem space. But this way we can focus our refactoring on only the areas that really need it, instead of reinventing the entire way Formulize talks to the database.
This is sort of related to worse-is-better (http://www.jwz.org/doc/worse-is-better.html). I'm obviously valuing minimal-effort-to-working-code instead of a 100% "complete and correct" approach. It all depends what your priorities are, and to some extent, your philosophy.
Brian: Was anything easy about adding SQL Server support?
Julian: Well, yes...the data importing/migration. There are good tools for converting a MySQL database to MSSQL, like the SQL Server Migration Assistant. So that was nice, we didn't have to fight with anything there. It migrated our schemas and data without any hassle.
I was developing on Windows, so I didn't have any problem connecting to MSSQL, but the lack of a Linux driver for connecting to MSSQL is a huge barrier to adoption for devs who develop or deploy on Linux.
Brian: Yes, SQL Server access from PHP on Linux is a huge ask from PHP developers. I know that management in the SQL Server organization are aware of the request.
I’ve taken a lot of your time, so I’ll only ask one more question. Do you have any advice for developers looking to add new server support to an application?
Julian: Don't be afraid of it…that would be the big thing. You will learn a lot about the new server for sure, but you'll also learn a lot about your own app, because you'll have to look at some of your assumptions from a different perspective.
Like the whole lastInsertId issue. They're subtle things, but they arise from deep architectural decisions. They show how hard it is to actually draw boundaries between your programming language and your data store. The capabilities of each part of your stack, will impact your application in subtle and not so subtle ways. It's hard to completely separate these layers of your app. Conversely, if you do completely separate them, are you making optimal use of the tools at your disposal? But if you want maximum portability for your app/code/system, you have to generalize to a certain extent.
In my opinion, these are the kinds of issues that make computer science interesting, not writing complex loops or OO code (which isn't boring, but you know what I mean).
Brian: Agreed! I’m sure that any major undertaking, such as adding multi-database support, will take you down learning paths that you didn’t expect. Thanks for sharing and good luck with Formulize and Freeform Solutions!
Be sure to check out the Freeform Solutions website. You can learn more about Julian on his Freeform Solutions team page or by following him on twitter: @jegelstaff.
-Brian