Follow Me On...

Using GREP + SORT + UNIQ to find occurrences of a repeated event by an id field

Use Case: you have log file but it has a bunch of entries that start with a time stamp. You had a process which continuously crashed such that each time it restarted it would reprint an occurance. Well we want to just get one match for the first time it occurred… luckily i had an identifier in my log file to key off of.

My Log File

grep "Description Mismatch" logfile.log

[2015-10-24 16:30:01.655] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
[2015-10-24 16:45:01.672] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
[2015-10-24 17:00:02.073] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
[2015-10-24 17:00:02.146] [WARN] scheduler - Description Mismatch 562b997fb0e2bbb208f1f7dd CRM Description: -- Auto Created by Callinize Callinize Description: gq
[2015-10-24 17:15:01.815] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
The first 3 and the last are the same occurrence but at different times. First we sort using the ```-k``` option. 8,8 means the 8th field which matches "562b955c8c01d13309889115" ``` grep "Description Mismatch" logfile.log | sort -k 8,8 ```
[2015-10-24 16:30:01.655] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
[2015-10-24 16:45:01.672] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
[2015-10-24 17:00:02.073] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
[2015-10-24 17:15:01.815] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm

Next, lets use uniq with -f (which is the field to start the comparison at) to get the uniq lines. Now we have:

grep "Description Mismatch" logfile.log | sort -k 8,8 | uniq -f 8

[2015-10-24 16:30:01.655] [WARN] scheduler - Description Mismatch 562b955c8c01d13309889115 CRM Description: -- Auto Created by Callinize Callinize Description: vm
[2015-10-24 17:00:02.146] [WARN] scheduler - Description Mismatch 562b997fb0e2bbb208f1f7dd CRM Description: -- Auto Created by Callinize Callinize Description: gq
[2015-10-24 18:00:04.052] [WARN] scheduler - Description Mismatch 562ba7a41cce2fd10843296f CRM Description: -- Auto Created by Callinize Callinize Description: sold

Final Command

Tack on a wc -1 to get a count output.

grep "Description Mismatch" logfile.log | sort -k 8,8 | uniq -f 8 | wc -l


Bash Script for monitoring ulimit for multiple processes

Use the script i posted here:

Then pipe it to a file so you can analyze the sockets used by your app overtime.


How to determine what's causing: "Error: connect EMFILE" (Node.js)

I came across a bunch of posts on stackoverflow suggesting I use something like graceful-js to solve my problem, or to increase ulimit. Essentially, they’re workarounds. I wanted to know what the root problem was. Here’s the process I came up with. If anyone knows a better way, please let me know in the comments.

What This Error Means

There is a number of file handles a process can have open. Note: that sockets also create a file handle. Once you reach the limit you cannot open anymore and a cryptic error message such as “Error connect EMFILE” will end up in your log file (hopefully). The default limit (at least on my ubuntu system) is 1024.

How To Isolate

This command will output the number of open handles for nodejs processes: lsof -i -n -P | grep nodejs

nodejs    12211    root 1012u  IPv4 151317015      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1013u  IPv4 151279902      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1014u  IPv4 151317016      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1015u  IPv4 151289728      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1016u  IPv4 151305607      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1017u  IPv4 151289730      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1018u  IPv4 151289731      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1019u  IPv4 151314874      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1020u  IPv4 151289768      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1021u  IPv4 151289769      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1022u  IPv4 151279903      0t0  TCP> (ESTABLISHED)
nodejs    12211    root 1023u  IPv4 151281403      0t0  TCP> (ESTABLISHED)

Notice the: 1023u (last line) —- that’s the 1024th file handle which is the default maximum.

Now, Look at the last column. That indicates which resource is open. You’ll probably see a number of lines all with the same resource name. Hopefully, that now tells you where to look in your code for the leak.

If you have multiple node processes, you can isolate it by using PID in 2nd column.

In my case above, I noticed that there were a bunch of very similar IP Addresses. They were all 54.236.3.###

So, I started doing iplookups on all the different 3rd party services I used… loggly, newrelic, pubnub… until I ultimately determined it was pubnub. Turned out we were creating a new socket each time we published an event instead of reusing.

Command Reference

Use this syntax to determine how many open handles a process has open…

To get a count of open files for a certain pid

I used this command to test the number of files that were opened after doing various events in my app.

lsof -i -n -P | grep "8465" | wc -l

root@ip-10-101-42-209:/var/www# lsof -i -n -P | grep "nodejs.*8465" | wc -l
root@ip-10-101-42-209:/var/www# lsof -i -n -P | grep "nodejs.*8465" | wc -l
root@ip-10-101-42-209:/var/www# lsof -i -n -P | grep "nodejs.*8465" | wc -l

What is your process limit?

ulimit -a

The line you want will look like this: open files (-n) 1024


Setting created and modified user via SugarCRM Webservice API

A Callinize user recently reported that they were getting email notifications for each Call being created. They had assignment notifications turned on, so whenever created_by_user !== assigned_user_id, an email gets sent to the assigned user that looks like:

{The_API_User} has assigned a Call to {Assigned_To_User}.

Start Date: 2014-01-03 11:44:35 UTC(+00:00)
Duration: 0h, 3m
Description: {description}

You may review this Call at:

So, i figured oh we’re not setting the created user… this should be an easy fix. Well, for me at least… it wasn’t. Special thanks to Jason Eggers of SugarOutfitters or I would have never figured this out…

Flags Required

In order to modify them, you must include these flags. And no, it’s not a typo… they’re named inconsistently.

    "set_created_by" => false,
    "update_modified_by" => false,

Fields Needed

Again, not a typo… they’re just inconsistent.

    "created_by" => $userCrmId,
    "modified_user_id" => $userCrmId,

Final Snippet

Putting it all together, here’s the snippet I use in my set_entry calls:

    // Sets: assigned, created, and modified user to $userCrmId
    "set_created_by" => false,
    "update_modified_by" => false,
    "assigned_user_id" => $userCrmId,
    "created_by" => $userCrmId,
    "modified_user_id" => $userCrmId,

    // NOT NEEDED, Kept because:
    // 1. In case future API makes them more consistent. (dont want to revisit)
    // 2. Doesn't break anything (Surprisingly!)
    // 3. Will help people find this posting searching on google ;-)
    "set_modified_by" => false,
    "update_created_by" => false,
    "created_by_id" => $userCrmId,
    "modified_by_id" => $userCrmId,
    "modified_by" => $userCrmId,
    "created_user_id" => $userCrmId,

Note: I tested this with Sugar v4_1 api via rest. I’ve heard it might have changed in the newer rest apis that Sugar 7 has. If anyone knows, please add a comment.

EDIT: After going through all this… It turns out there is a bug in 6.7.1 which was the real problem. I think in Sugar 6.5 and below assignment notifications are only triggered for records which are updated (not created) and the assigned_user_id changes. Please upvote this issue:


SugarCRM, SQL query to determine Sales Cycle From New Account --> Customer

SQL query below runs against SugarCRM Audit tables to determine how long it took an Account record to become a “Customer”.  It looks at the audit tables to calculate when the field was changed.


SELECT, accounts.industry, accounts_audit.before_value_string, accounts_audit.after_value_string, 
accounts.date_entered,accounts_audit.date_created AS “Became Customer”, TIMESTAMPDIFF(DAY, accounts.date_entered, accounts_audit.date_created) AS “How Long”
FROM accounts_audit
JOIN accounts ON accounts_audit.parent_id =
JOIN accounts_cstm ON = accounts_cstm.id_c
WHERE after_value_string = “Customer” AND accounts.deleted = “0”

SELECT, accounts.industry, accounts_audit.before_value_string, accounts_audit.after_value_string, accounts.date_entered,accounts_audit.date_created AS “Became Customer”, TIMESTAMPDIFF(DAY, accounts.date_entered, accounts_audit.date_created) AS “How Long”FROM accounts_auditJOIN accounts ON accounts_audit.parent_id = accounts.idJOIN accounts_cstm ON = accounts_cstm.id_cWHERE after_value_string = “Customer” AND accounts.deleted = “0”




Creating an "API Only" User in SugarCRM

There are lots of reasons for API Only users, perhaps you are integrating with a marketing automation system, you developed a custom portal, or have a sugarcrm telephony integration (CTI) like Callinize.  In the past I had just used the admin account but this wasn’t ideal as if the credentials got out that user has ability to login.

The other thing I wanted was the ability to know when records were created by the system.  By creating a seperate user for your job the records would show up as created by “Your Integration User” instead of some other generic user. 

But, if you’re on a paid version of sugar, you might not want to fork over another $30+ a month.  The good news is there is a solution. CE users don’t care about this since they’re not paying per user.  Sugar Enterprise users by default have the “Create Portal User” option…. But what about us Sugar Pro customers?  Nothing seems obvious at first.

Fortunately, you can add a line to config_override.php and then you too will have the “Create Portal API User” option.

$sugar_config['enable_web_services_user_creation'] = true;


Now, when you go Admin -> User Management and mouse over the Users module tab there is an option for creating an “Portal API User” just like the Sugar Enterprise customers have.



SugarCRM Rest API Example: How to get all contacts for an account

This relies on the following PHP Wrapper Class:

For a slightly better formatted answer see my Stackoverflow Post Here.

 * returns an array of contacts that are related to the accountId passed as a param.
 * The array returned will be an array of associative arrays.
 * @param $accountId
 * @param array $contactSelectFields optional sets the different items to return, default includes id, email1, name, title, phone_work, and description
 * @return array
public function getAllContactsAtOrganization( $accountId, $contactSelectFields=array("id", "email1", "name", "title", "phone_work", "description")) {


    $fields = array( "Accounts" => array("id", "name"),
        "Contacts" =>  $contactSelectFields);
    $options = array(
        'where' => "'$accountId'"
    $apiResult = $sugar->get_with_related("Accounts", $fields, $options );

    $contacts = array();
    foreach( $apiResult['relationship_list'][0]['link_list'][0]['records'] as $almostContact) {
        $curr = array();
        foreach($contactSelectFields as $key) {
            $curr[$key] = $almostContact['link_value'][$key]['value'];
        $contacts[] = $curr;


    return $contacts;

Sample Return

    [0] => Array
            [id] => 47e1376c-3029-fc42-5ae2-51aeead1041b
            [email1] =>
            [name] => Blake Robertson
            [title] => CTO
            [phone_work] => 8881112222 
            [description] => Opinionated developer that hates SugarCRM's REST API with a passion!

    [1] => Array
            [id] => 4c8e3fcf-8e69-ed7d-e239-51a8efa4f530
            [email1] =>
            [name] => Carolyn Smith
            [title] => Director of Something
            [phone_work] => 832-211-2222
            [description] => She's a smooth operator...

For Reference Purposes

Here’s the “rest-data” (nicely formatted)

Used print_r of the php array

    [session] => 9j7fm4268l0aqm25kvf9v567t3
    [module_name] => Accounts
    [query] =>'e583715b-7168-5d61-5fb1-513510b39705'
    [order_by] => 
    [offset] => 0
    [select_fields] => Array
            [0] => id
            [1] => name

    [link_name_to_fields_array] => Array
            [0] => Array
                    [name] => contacts
                    [value] => Array
                            [0] => id
                            [1] => email1
                            [2] => name
                            [3] => title
                            [4] => phone_work
                            [5] => description



    [max_results] => 20
    [deleted] => FALSE

Post Body



Fulltext search engines for mediawiki: Solr vs Sphinx

I tried out both on our companies mediawiki instance.

Both provide much better search results then the default Mediawiki search.  Here are my thoughts…


Was slightly easier to setup.  Doesn’t include wiki tags in the search results.  Documentation was better.  Properly 

SOLR (lucene)

Prefered the search results.  I think it gives better weighting to title matches.  In my case, solr was already installed on my box (used a bitnami stack).  Documentation wasn’t as good.  My contractor had to email developer for a couple things.  Other pro is there are more tutorials for getting other information into SOLR then their are sphinx.  In my case I’m going to be indexing my mediawiki instance plus my knowledgebase so this made it a no brainer for me.

I plan on figuring out how to strip the mediawiki tags out and get it to highlight the keyword matches.  I’ll post back when I do. 

Here’s some screenshots:




Generating an avatar for customers with a splash of utility

It’s kind of a pet peave of mine when I see a bunch of empty contact photos. So, when we started rolling out a support desk program (we’re using FreshDesk), the last thing I wanted every day was to have a bunch of blank contact photos staring back at me.

I recently wrote a script which will generate icons for customers who haven’t set a profile picture. The result I think has some nice utility to it as well…

For each customer, I generate an icon consisting of their initials and and then a rectangular bar of color. The color of their initials is unique to them… and the bar of color is the same for each customer in an organization.  This allows you to glance at the list of tickets and more easily recognize the contacts / companies.  

To generate the “random” color, I simply did an MD5 hash of their Name or Company Name.  Then, took the first 6 digits and made that hex value my color.  Pretty easy.

I wrote a script and API wrapper for FreshDesk which can iterate through all the contacts for your organization and update them all in one step!  That code is available on github here:  

In case you’re wondering, why didn’t I just use gravatar?  Only about 5% of our customers have their work email addresses (which we correspond with) associated with a gravatar profile.  So, this is the next best alternative.

There’s a lot of other cool things you could do with something like this… You could have maybe special colors indicating support priority levels… add little stars or flags maybe… Please post back if you come up with a better design.  I ended up using Future MD Cn Bold.  




SugarCRM - Update Calculated Fields Nightly

SugarCRM has some powerful calculated fields called Sugar Logic. One limitation though is they are only calculated when a field is saved. I wanted a simple way to display the number of days since X occurred.

So, I created a very simple script which can be run from the Scheduler every night. It basically just iterates through all the records for a user configurable module type and forces the calculation to update. Best of all it does this without changing the modified date, modified by or creating a tracker entry.

This script could also be used one time to just initially seed all the calculated values for a module. The suggestion from the admin guide is to do a mass update but that’ll cause modified dates, etc to change.

The code is hosted on github here:

Potential Applications:

  1. Days since lead was last contacted.
  2. Days since order added to system.
  3. Days since a support case was entered.
  4. (There’s lots)

My original application was to keep track of how long it was taking our company to ship orders once we had all the information from a customer.

  1. I created a workflow rule such that when orderstage == Production Ready, I would set the date field “productionready_date”.
  2. I then created a calculated field called “dayssinceproduction_ready” and set the formula to something like abs(daysSince($production_ready_date))
  3. Installed the script on my github site and created a scheduler job to run at 3am every night.

We now use the field as a dashlet on our order status tab and it’s used in some reports to be able to look back over the past year and see how well the production team is doing at getting orders out and to be able to do analysis on the orders that took a long time to identify ways to speed them up.

Alternate Strategies

An alternative to using custom fields would be to create a nightly task which just updates a fields. I prefer this approach for the following reasons…

  1. I can add new calculated fields using Studio and not have to update code at all on the server. You can also have multiple calculated fields per module.
  2. Since Sugar Logic is also applied in Javascript fields, if a date field is changed manually (instead of via a workflow rule), the calculated field updates in realtime while in the edit view and will be accurate on the detail view as well. Otherwise you’d need to create a on save logic hook to do that if nightly synchronization wasn’t enough.

Other alternative is to not calculate the days and just have a date field and design reports to only show records that are within a certain time range. This was the initial approach I used to take, I find that this makes creating reports a LOT harder. Also, if you want to calculate something like the average time an order was fulfilled you can’t do that with Sugar’s Reporting at this time. Plus it’s just easier in my opinion to look at an integer such as 14 vs. 2013-02-18. It also simplifies custom formatting of rows in my dashlets. It’s fairly easy to write some jquery code to highlight based on a number… having to parse dates and calculate them would be slower and more of a pain.