Monday, December 14, 2009

A Good Programmer?

I came across this blog post via Hacker News tonight, and it gave me a little food for thought.

I should know better, of course, than to just take stuff like this as gospel truth, but I hear it a lot from people like Jeff Atwood, who make their living talking about programming. To be a "good" programmer you need to be the kind of person who just loves it, and does it all the time.

The second one I don't know that I agreed with too much, mostly because I know plenty of people who talk up "bleeding-edge" technologies who are only talking them up because they're bleeding-edge, and couldn't even begin to actually program in it if they wanted to because they lack even the most basic skills. This is primarily what I run into with kids singing the praises of the latest Microsoft technology (not to piss on Microsoft technology necessarily, but there's a reason for that). However, taken with the rest of the list, it's a little more understandable. I'll still hold on to my dreams of kernel hacking, though. ;)

The one that really hit me in a tender area was the last one..."If your potential programmer didn’t do any programming before university, and all his experience starts when she got her first job, she’s probably not a good programmer." Ouch. That describes me almost to a T. Granted, I started programming in my undergrad career while pursing another degree, and the Master's was technically an extension of a "hobby", but before that I had never done any programming. In CS 120 I had to go to my professor's office for help because I didn't know what FTP was. Yes, it was that bad.

I had no access to any resources to even begin to understand how to do it, and didn't know what to look for anyway. It has been the primary source of my low self-confidence in my programming ability the entire time I have been attempting to make the computer bend to my feeble will. Even now, when I know I've improved so much, I still never feel like I've worked hard enough or dedicated myself enough to improving my skill. I've tinkered with a wide variety of languages but am still very much a C++/Java person.

Anyway, expressing my insecurity is not particularly helpful...I'm off to start reading more books and working on more projects.

Saturday, December 12, 2009

Two Changes!

I've moved my portfolio page to CodeMonkeyInc on Google Sites because, frankly, I am fully capable of writing a website backend, but I couldn't design my way out of an empty pool. Not to say I haven't tried really hard, but I lack the requisite skills in terms of creating backgrounds and other important images that look clean and professional, rather than like I made them up in Gimp after dicking around for a half hour. Also, I have yet to actually BUY hosting, so my iweb account will be going down after I graduate anyway.

The other thing I did this weekend was post a few of my person projects to Launchpad so I could show them off on the portfolio page. Right now my projects on there aren't extremely impressive, but ScribbleMidi is coming along really well, and I'm anticipating having a semi-working system soon. Launchpad is a wonderful, free way to publicly post your open source projects.

Sunday, December 6, 2009

Battle With the PHP Script From Hell

In seminar class we have to write a script that takes in a gigantic (3.2MB) text file data dump, parse the data out, and insert it into a MySQL database for an application we're working on. The first pass was done by a classmate, and although it got the job done quickly (averaged around 13 seconds), the other requirement was that the script be easy to modify by non-programmers, and this was not even remotely easy to modify (it took me a half an hour to add one line). So I took it upon myself to rewrite the whole thing, and so far the result has been an interesting exercise.

Problem 1: The data is delineated by XML-style tags, but is not in an XML structure.
Problem 2: Some of the records (collections of data that represent one art piece) are invalid, as they just describe different image file names for a single art piece.
Problem 3: Some of the data is unique (such as the style, technique, etc), and the values are often in a list separated by semicolons.
Problem 4: The current version of my rewrite takes well over 400 seconds to run.
Problem 5: I had pretty much a weekend to write this.

The data dump and the way the records are structured is unavoidable. I approached the problem by reading in one record at a time and passing it through a series of functions to pull out the appropriate values, then inserting them into the MySQL database. It does this one query at a time, however, which I suspect is part of the problem.

The first step in improving is exploring the REPLACE function. I'm currently running a query that checks a table if the current values exists, otherwise it needs to be added in. Making these required entries unique should remove the need for these extra queries.

The result? Down to around 330 seconds, not as bad. The primary keys are a little screwed up, as expected, but since it's an auto-incremented number, it isn't a huge deal.

At this point the primary bottleneck is in the bridge tables. Here's how this works: all the bridge tables simply connect an art piece with its corresponding style, technique, etc. So there's a style table, which is only a list of styles, but we need to take the artID (one select query), then select the corresponding styleID, and put them in one table. This wouldn't be so bad except it's 2 queries in a row for each of the tables; that's a lot of individual queries.

**Note: at this point I realized I made an extremely stupid error and kept adding onto the records array rather than clearing it after each record was processed *facepalm!*
Fixing that major leak got the script down to 131 seconds.

After making a huge difference with the array I managed to cut it down even more by fixing the art table creation. This function was using two different queries to build the table, which was unnecessary. It's now running at around 16 seconds!

Right now I'm pretty happy with where the script is at, so I'll save the optimization of the bridge tables for later.