How to write unreadable und unmaintainable code

From Sustainability Methods
Revision as of 13:42, 3 September 2024 by Gustavo (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

THIS ARTICLE IS STILL IN EDITING MODE

Introduction

In this guide we will explore how to master the art of producing unreadable and unmaintainable code. By religiously adhering to the basic principles and examples below, you will be sure to be the only one who really understands your submissions. Hence, no one can question your position of absolute knowledge and ability, and anyone following in your path would have to spend an enormous amount of time to pick things up where you left off. Remember: clarity is overrated, embrace the chaos. Obfuscation is a key to job security!

This is a humble tribute to the evergreen by Roedy Green "How To Write Unmaintainable Code"

1 Naming Conventions

1.1 Variable Naming

We start with an elementary, yet powerful tool: variable and function naming. Forget all about descriptive, meaningful names. Instead, use single letters, ambiguous words or misleading names wherever possible. Some examples:

x = ...         # a confusion matrix
temp = ...      # global variable holding the main dataset
foo = bar = ... # just about everything else

1.2 Function Naming

The same goes for functions. Make sure that a function always does less or more than its name would suggest. As an example, the function drop_nan_rows(df) should also convert the dataframe to the long format and publish the result on Kaggle.

def drop_nan_rows(df):
    # 1 drop nan rows
    # 2 convert df to long format 
    # 3 publish the result on kaggle
    pass

1.3 Acronyms, Acronyms, Acronyms

Use acronyms wherever possible in the code as well as in comments. Make sure to never define them anywhere. Whoever does not intuitively understand your acronyms, is not on your level and therefore does not deserve to comprehend your genius anyway.

wtda = [150, 200, 350, 275, 300]  # weekly transaction data array
pmcbv = [1.5, 2.2, 3.1, 2.8, 2.0]  # product metric calculation base values

sowt = sum(wtda)
mtviwd = max(wtda)
atv = sowt / len(wtda)
wswpm = sum([a * b for a, b in zip(wtda, pmcbv)])
nwm = wswpm / mtviwd
cmar = (nwm * atv) / sowt

1.4 Smurfing

Smurf naming conventions can be a beautiful tool for obfuscation. Overly long variable names with redundant prefixes come with a list of advantages:

  • when done right, some lines will be too wide for the screen, forcing the reader to scroll sideways or zoom out to an eye-damaging extend;
  • more room for mix ups and spelling mistakes;
  • higher verbosity of the code without any added clarity.
smurf_financial_account_balance = 1000
smurf_financial_account_transaction_limit = 500
smurf_financial_account_number = '12345XYZ'
smurf_financial_account_holder = 'Mr. Smurf'
smurf_financial_account_transaction_history = []

def smurf_process_transaction(smurf_account_transaction_amount):
    global smurf_financial_account_balance
    if smurf_account_transaction_amount <= smurf_financial_account_transaction_limit:
        smurf_financial_account_balance -= smurf_account_transaction_amount
        smurf_financial_account_transaction_history.append(smurf_account_transaction_amount)
        return True
    else:
        return False

1.5 Mixing Case Systems

There are different opinions, even disputes, about the right case system. PEP8 recommends snake_case for variables and CamelCase for classes. However, why restrict yourself to one format, and not do justice to the multitude of opinions out there? Mix snake_case, CamelCase and random capitalization. Bonus points for using all three in one single variable!

data_loader = ...
DataProcessor = ...
dataMO_DEL = ...

1.6 Creative Synonyms

The beauty of languages is sometimes viewed as the degree of nuance and variety in which thoughts can be expressed. Use a dictionary or a search engine to find synonyms. If possible, never use the same word twice. Vaguely imply that there are subtle differences, where there are none.

# in this case we use
def display():
    pass

# but sometimes also
def present():
    pass

# how about
def show():
    pass

# in other cases
def visualize():
    pass

1.7 Reuse Variables for Efficiency

Being dynamically typed, Python allows you to change contents and data types of any variable at any point. You can use this to minimize the total number of variables, used in your code. For advanced usage, break the boundaries by playing with local variables shadowing global variables, package names and standard functions. Mixing number bases is also fun.

pandas = [0o120, 150, 180]
average = sum(pandas) / len(pandas)
print("Average:", average)

pandas = {"count": len(pandas), "avg": average}
print("Data Summary:", pandas)

pandas = "Total count: " + str(pandas['count'])
print(pandas)

pandas = lambda x: x * average
result = pandas(2)
print("Result:", result)

2 Comments

Since comments are ignored by the interpreter, you can pretty much do what you want here. While it is good to avoid commenting in general, it might raise suspicion and you could be forced to add them later. So best is to use comments to lie, deceit and state redundant or obvious information.

2.1 Comment the 'What', Never the 'Why'

Our fundamental principle of commenting shall be to comment, what the reader sees in the code anyway. This way, you demonstrate the ability to formulate what is happening in plain English. It will seem like you did a great job of adding comments everywhere, while the only thing you did is state information that is mostly redundant. People after you will forever wonder about your intentions and the reasons for your decisions, leaving them in eternal awe:

# read csv file data.csv
df = pd.read_csv('data.csv')

# drop the rows 1,2,3
df = df.drop([1,2,3])

# reset the index
df = df.reset_index()

2.2 Lies and Contradictions

In most cases, you don't have to intentionally lie in the comments. The contradictions will come from itself, once you stop updating comments along with changes to your code. Just save the trouble and watch chaos gradually unfold itself.

3 Formatting

3.1 PEP8: Public Enemy Number 1

Adherence to PEP8? Not in our world! Mix tabs and spaces. Indent randomly. Make sure your lines of code resemble abstract art. The more inconsistent, the better.

def  calculate_profit(revenue,expenses ) :
         profit=revenue-expenses
         return profit

Avoid using dubious code formatters and linters like autopep8, black or yapf.

3.2 Method Chaining: Pure Aesthetics

Method chaining is nice, especially when multiple operations are subsequently performed on the same dataframe in pandas. Not only it is practical, done right it also offers a aesthetic way of creating extremely long and hard-to-read lines of code.

df = pd.read_csv('data.csv')

result = df.assign(D=lambda x: x['A'] + x['B'], E=lambda x: x['C'] - x['A']).query('D > 3').groupby('D').agg({'E': 'mean'}).rename(columns={'E': 'Average_E'}).reset_index().sort_values(by='Average_E', ascending=False).pipe(lambda x: x[x['Average_E'] > 15]).drop_duplicates(subset=['D']).reset_index(drop=True).loc[:, ['D', 'Average_E']]

If other students are not willing to decipher this python-enigma, they have no business in working with your code in the first place. So, at any time, avoid splitting the method calls over multiple lines:

result = (df.assign(D=lambda x: x['A'] + x['B'], E=lambda x: x['C'] - x['A'])
            .query('D > 3')
            .groupby('D')
            .agg({'E': 'mean'})
            .rename(columns={'E': 'Average_E'})
            .reset_index()
            .sort_values(by='Average_E', ascending=False)
            .pipe(lambda x: x[x['Average_E'] > 15])
            .drop_duplicates(subset=['D'])
            .reset_index(drop=True)
            .loc[:, ['D', 'Average_E']]
         )

4 Structure and Dependencies

Code structure is the tactical nuclear missile in our arsenal for code obfuscation. Especially with longer programs, avoid using packages, classes and functions for reusability and clarity. Instead, use them as headings!

Only create monolithic, 'do-everything' functions. Aim for at least 1000 LOC per function. The deeper you nest your loops within said functions, the better. Afterall, it is easy to overwhelm the short term memory of your fellow humans with only a handful of nesting levels. Especially when they can't see the start and end of each block on screen simultaneously.

def do_everything(data):
    # 1000+ lines of code that do everything from data cleaning to model training and serving
    pass

Further, use as many libraries and dependencies as possible, especially if they do the same thing. You never know what you might need in the future. Import regularly in the middle of your files, even if you don't use anything from it or it was already imported earlier. Never provide a requirements.txt file or a pip freeze output. Never fix versions of your dependencies. Compatibility is an afterthought and must not stop your progress.

# Importing a whole module from which you only use a single, trivial math function later
import pandas as pd

# Use wildcard imports for pollution
from math import *
from numpy import *

# Import internal components of a package, which might change in future versions
from sklearn.ensemble._forest import RandomForestClassifier

# Import with unclear names, making it hard to understand where functions come from
from datetime import datetime as td

# Import modules and don't use them at all
import sys
import os

References & Further reading

1. http://www2.imm.dtu.dk/courses/02161/2018/files/how_to_write_unmaintainable_code.pdf

2. https://peps.python.org/pep-0008/

3. https://docs.python-guide.org/writing/structure/

4. https://mitcommlab.mit.edu/broad/commkit/coding-and-comment-style/

5. https://archive.org/embed/the-elements-of-programming-style-second-edition

6. https://www.oracle.com/docs/tech/java/codeconventions.pdf


The author of this entry is Jann Pfeifer.