in “major” databases there is sometimes an ETL (Extract,Transform, Load) tool. as DataFrames are the “commonly” used data structure in Python for similar operations (and analysis), you can perform all three functions. That said, i prefer to only do the ‘E’ and ‘L’ as they are “simply” accomplished by built-in functions. The ‘T’ require me to use a for loop and read each row using a file handler, so it’s more “convenient” for me to manipulate the data once it’s imported.

it’s important to note that determining which dataset to use can involve unconscious/implicit bias. therefore in analysis (and offering insights), you need to consider the source: no matter the prevailing “wisdom”, one needs to distinguish between fact and opinion.

here is the updated GitHub repository:

https://github.com/LinsAbadia/Python/tree/master/DataFrames

there are many ways to instantiate a DataFrame but here’a a primer on typical ways to create one.

the DataFrame is the primary data structure in Python for data science. it acts like a spreadsheet or database – it kind of reminds me of the Data Window object in PowerBuilder (it was very convenient for me). And unlike most high-level programming computer languages it didn’t need a “connector” (or driver) like ODBC (Open DataBase Connectivity) or JDBC (Java DataBase Connectivity) – you were lucky if there was a “native” one because it performed quicker as there was no need to “translate” stuff – to interact with external databases.

here is my updated GitHub repository:

https://github.com/LinsAbadia/Python/tree/master/DataFrames

when i started preparing stuff for DataFrames, it seemed sensible to introduce the Python Dictionary.

in it, i use the NATO alphabet, which as we all know is an acronym. Another form of an abbreviation is an initialism. Both utilise the first letter of words to form a “new” word but the former pronounces it as a word, while the latter is voiced by each initial (like AI for Artificial Intelligence).

as part of forming the subheadings, (although often used interchangeably) i discovered if i should use duplicate or replicate.

here’s the updated repository:

https://github.com/LinsAbadia/Python/tree/master/DataStructures

it’s complicated

November 3, 2019

i put a draft of QuickSort implemented in Python – admittedly, i’m open to suggestions to further improve it and any other examples that will help understanding. Like my experiences before, it was “difficult” for me to find a “simple” explanation online.  Since some programming languages implemented it as part of a standard library, some ICT professional aren’t familiar with its internal workings and don’t bother to learn it.  i’m all for black boxes and abstraction but when trying to master a language it helps to implement fundamentals – this doesn’t only sharpen one’s thinking ( sort – pun intended – of a form of mental gymnastics) but also to familiarise oneself with the intricacies/quirks of a language.

this absence of “simple” resources seem to be due to a number of things.  my direct experience is that it is sometimes due to the attitude and education/training of technical personnel.  some of them just want to feel superior/smarter than the rest of us – their “hang-ups” from school is evident so that they in turn mistreat others that’s why, IMHO, hazing practices persist.  some act, understandably, as “gate-keepers” to try and make this knowledge exclusive in order to protect their jobs (i.e. economic reasons) or status (i.e. social motivations) or both. and while they most are capable enough to understand, they are not clever enough, equipped to, or motivated to (there’s an obvious misalignment of objectives) make these concepts “easily digestible” for others.  the willingness to help masks their hubris or condescension  – a humble brag of sorts. this fact necessitates me to query my own motivations.

while i don’t recall it being discussed (probably due to my specialisation), it may have been covered in passing by a course in my masters, i could no longer remember how it worked exactly before this endeavour.

the updated GitHub repository can be found at:

https://github.com/LinsAbadia/Python/tree/master/Problems/Algorithms

 

“easy” A

October 30, 2019

first, it was the Towers of Hanoi.  And then it was QuickSort.  i wanted to provide “simpler” explanations of concepts i was taught during my undergraduate days.   i eventually got them (with some effort) but they are trickier to “share” with others.  But like most technologists, i’ve substantially underestimated the the time and effort in realising these. i guess i was severely swayed by my experience in “simplifying” the normal forms when i taught databases.

i think i need to take a cue from the much greater individuals that preceded me. Or just use great ideas from others and properly attribute their work.  in any case, i should reacquaint myself with these concepts and refresh my memory – so that my unconscious mind can continue to work on these while i focus on something else.  hopefully, it will no be long between “Eureka” moments.

prime (directive)

October 21, 2019

i moved to a function that determines if a number is prime – i’m still struggling with how to make the Tower of Hanoi problem “simpler” (as Einstein puts it) to understand.  i always knew that “0” was not prime but now i know why not.  i was taught in school that “1” was prime but apparently not according to the definition.

Here’s the updated GitHub repository:

https://github.com/LinsAbadia/Python/tree/master/Problems/Exercises

tower of Babel

October 17, 2019

for the last few days, i’ve been held up by the “Tower of Hanoi” problem.  at first it was just a debugging issue and getting the code to work as expected; however i realised the real difficulty was in being able to explain the algorithm “simpler” and in “plain English” – i’m still thinking about  how to do this.