2022-11-20: New Stuff I Learned: Grep and Databases

I recently learned a couple of new tricks, and I want to share a practice I engage in that I have gotten some compliments on.

I have worked on a few Java web applications, and instead of searching with the IDE, I use grep in cygwin. Sometimes the term I search for is not just in a Java file, but it is also in a Javascript file that has no line breaks. (Seriously, why do people make files like this?) This makes the results harder to read. One possibility is to pipe the results of the grep to another grep:

grep -riIn someString * | grep -v '.min.js' | grep -v '.js.map'

The disadvantage of this method is you lose the color in the output. I read the man page, and I found there is an exlusion flag: –exclude. Even better, you can use it more than once:

grep -riIn --exclude=*min.js --exclude=*.js.map someString * 

I also found out the -I option can exclude binary files, which were also polluting the output.

The database we use is Oracle. I started using the Explain Plan to analyze some queries. You can significanly reduce the cost of a query by using upper to compare strings. Going from this:

where some_column like 'some string%'

to this:

where upper( some_column ) like upper ( 'some string%' )

reduced the cost of a few queries from 20,000 to 40,000 to between 30 to 40. I thought that using “upper” would take longer, since it has to run that comparison on the whole table, and that using equals would be faster, but using “upper” is faster. I do not know if this also holds true for other databases. On Oracle “upper” seems to be faster than “lower”.

Another thing I want to mention is that I have gotten a few compliments on how I format insert statements. I line the column names up, and I put in comments in the values() part of the statement for the name of each column.

You have probably seen insert statements like this:

insert into us_state( state_id, state_name, capital,largest_city, abbreviation, population, median_age, persons_sq_mile,nickname, area_sq_mile, avg_elevation, north_latitude_deg, north_latitude_min, south_latitude_deg, south_latitude_min, east_longitude_deg, east_longitude_min, west_longitude_deg, west_longitude_min, created_date, created_by, modified_date, modified_by)
values (func_to_get_next_id(), 'Illinois', 'Springfield', 'Chicago', 'IL', 12882135, 39, 231, 'Prairie State', 57915, 600, 42, 30, 36, 58, 87, 30, 91, 31, sysdate(), user_id, sysdate(), user_id);

I have some that are worse than that. Imagine that statement with no spaces after the commas.

Compare that to this:

insert into us_state( 
    state_id,           state_name,         capital,            largest_city,       
    abbreviation,       population,         median_age,         persons_sq_mile,
    nickname,           area_sq_mile,       avg_elevation,      north_latitude_deg, 
    north_latitude_min, south_latitude_deg, south_latitude_min, east_longitude_deg,
    east_longitude_min, west_longitude_deg, west_longitude_min, created_date,       
    created_by,         modified_date,      modified_by 
) values (
    func_to_get_next_id(), -- state_id
    'Illinois',            -- state_name
    'Springfield',         -- capital
    'Chicago',             -- largest_city
    'IL',                  -- abbreviation
    12882135,              -- population
    39,                    -- median_age
    231,                   -- persons_sq_mile
    'Prairie State',       -- nickname
    57915,                 -- area_sq_mile
    600,                   -- avg_elevation,
    42,                    -- north_latitude_deg
    30,                    -- north_latitude_min
    36,                    -- south_latitude_deg
    58,                    -- south_latitude_min
    87,                    -- east_longitude_deg
    30,                    -- east_longitude_min
    91,                    -- west_longitude_deg
    31,                    -- west_longitude_min
    sysdate(),             -- created_date
    user_id,               -- created_by
    sysdate(),             -- modified_date
    user_id                -- modified_by
);

It is a little more typing up front, but it is less thinking later on [1].

I also reformat numbers in my notes. I keep notes as I work on stories/defects. If I get an id or a long number from a database, I will copy it, and then add dashes every third number:

4198275630/419-827-563-0
5832064197/583-206-419-7

Years ago I worked on a project where the IDs were a string based on the date down to the nanosecond with no dashes or separators. So I was dealing with very long strings of about 30 characters, half of which I did not need. I got tired of parsing long numbers in my head. More typing means less thinking. The fewer plates I have to spin in my head, the better. Granted, a lot of things that could reduce my cognitve load are usually out of my control, but I change what I can change.

You’re welcome.

Note 1: The formatting is lined up in the WordPress editor; sometimes it looks crooked in the final version. Things like this are why I sometimes think about going with a static site.

Image from Golden Gospels of Henry III, aka Codex Aureus of Speyer or Speyer Gospels (Speyerer Evangeliar), an 11th-centry manuscript housed in the Royal Site of San Lorenzo de El Escorial; image from Wikimedia assumed allowed under public domain.