Posts

Showing posts from May, 2019

[Node.js] Single-threaded Event Loop

Image
1) A simple analogy What is single-threaded event loop? Hope the following analogy of a doctor visit will help you! For this analogy, we are doing comparison in context of server-side web architectures. In a  traditional thread-model, when you get to a receptionist, you stand and fill out your form for as long as it takes. While you are filling out your form, the receptionist just sits and wait for you; unable to serve other people behind you. The only way to scale is to add more receptionists, which is costly - both in terms of labor cost and room allocation for the receptionist to sit. In an event-based system (which is what single-threaded event loop relies on), when you get to the receptionist, you are given a form to fill and told to come back after you have completed it. You sit down and fill the form as the receptionist helps the next period in line. You are not blocking the receptionist from serving the next person. When you are done, you line up again and h

[Node.js] Why choose NodeJS (over other languages)?

1) What is NodeJS designed for? NodeJS was designed to create real-time websites with push capability (similar to Gmail). That is why NodeJS works as a non-blocking, event-driven I/O platform. 1.1) I/O-based vs CPU-based Operations Awesome at I/O-based operations   As long as there's an asynchronous library supporting it - which they usually do. Save the computing time for context switching between threads. Don't need to worry about threading (or running out of resources of it) or too many client requests. Good at concurrent connections that have I/O operations. Horrible at CPU-based operations It's important to understand that simple calculations, such as sorting or traversing an array, can be 'heavy' because you are utilizing 1 thread for potentially thousands of client requests. Remember, you only have 1 thread of the event loop. So if you have 1 bad request can block your entire event loop! Worker threads is NodeJS' approach for CPU-inten

[Node.js] Web Architecture Design

1) Traditional Web Application Processing Model - Multi-Threaded Request-Response Any Web Application developed without NodeJS typically follows the "Multi-Threaded Request-Response model" (or just "Request/Response Model"). This differs from the NodeJS way in that it uses multiple threads to handle concurrent client requests. Under the hood, server waits in infinite loop and creates one thread per client request. If the server gets many client requests that require long blocking I/O operations, then more threads become 'busy'. As a result, the remaining clients requests have to wait longer due to core-thread resources. 1.1) Detailed Flow Ahead of time: Web Server is in infinite loop and waiting for Client Incoming Requests. Web Server internally maintains a Limited Thread pool to provide services to Client Requests. Once threads free up in thread pool, server pick up those threads and assign them to remaining client requests. Clients send re

[Redis] Publisher/Subscriber

1) What is Publish-Subscribe (pub-sub)? publish-subscribe(pub-sub) is a messaging pattern where senders of messages, called publishers, do not send the message directly to the receivers, called subscribers, but not instead categorize publishes messages into classes without knowledge of which subscribers. 2) Implementation On 2 separate clients, subscribe to a key using "SUBSCRIBE <key> <..key(s)> On a third client, use the key used in step 1 and enter "PUBLISH <key above> <message>" You should then see the message show up in the two redis clients that subscribed to the key. 3) Unsubscribe The opposite of subscribing. 4) Pattern matching pub/unsub Use "PSUBSCRIBE <regex>" or "PUNSUBSCRIBE <regex>" Resource https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern https://redis.io/topics/pubsub

[Redis] Transactions

1) Overview Redis transactions allow a group of commands to be executed sequentially; avoiding the problem of race conditions. It is not possible that a request issued by another client is executed in the middle of the execution of a Redis transaction. Additionally, Redis transactions are atomic, meaning either all of the commands or none are processed. 2) Implementation A transaction follows the format of: multi <your command(s)> exec Any commands you enter between multi--exec are queued.  2.1) Example 127.0.0.1:6379> multi 127.0.0.1:6379> set num 10 127.0.0.1:6379> incrby num 10 127.0.0.1:6379> exec (integer) 20 2.2) Discard Example If you want to cancel your transaction while using the redis-cli, you can just enter discard. 127.0.0.1:6379> multi 127.0.0.1:6379> set num 10 127.0.0.1:6379> incrby num 10 127.0.0.1:6379> discard 3) Errors inside a transaction There are two types of error that can happen durin

[Node.js] MultiProcess

1) Introduction Before the introduction of worker threads, from an implementation perspective, Node.js is single-threaded (for the most part). Under the hood, there are multiple threads managed by libuv to perform asynchronous I/O operations. However, because NodeJS is based on V8, which has a hard memory limit of about 1.5GB. Therefore, it cannot automatically take advantage of additional memory above that limit. This means that you can't take full advantage of multi-core machines. Thankfully for us, you can use the cluster module or a process manager (such as PM2) to spawn  child/worker processes to better utilize the multi-cores that you might have. 2) Cluster module 2.1) Pros Scales according to # of CPU cores available on your machine. Easy to manage as there is no dependency on any other module/service. Easy to implement process communication. 2.2) Cons You take a hit on performance of the app if there are too many messages. Implementation doesn't appe

[Node.js] MultiThreading & Worker Threads

1) NodeJS is single threaded, right? Prior to the introduction of worker threads (in ver.10.5.0), the answer is - Kind of . NodeJS run things in parallel, but programmatically,  the developer doesn't implement threads. Threads are automatically managed by libuv when we execute asynchronous operations, eg. I/O operations. Together with the use of callback functions, NodeJS achieves concurrency . 2) So what is the problem? CPU-intensive tasks The single-threaded model is great if all we do is asynchronous operations . Without worker thread support, any high CPU-intensive tasks will block the single-threaded event loop , because it is single-threaded - meaning it will wait for 1 task to complete before executing the next one. This scenario makes NodeJS not concurrent. 3) Multi-Processing Actually, it is technically possible to do multi-threading without worker threads. We can spawn child processes by forking to achieve multi-threading (arguably). But processes are expensive an

[Node.js] Intro to Node.js

1) What is Node.js? JavaScript without browser (eg. server-side scripting, scripting); represents "Javascript Everywhere" Unifying web application development around a single programming language (rather than different languages for server and client side) Event-driven , non-blocking I/O model - lightweight & efficient. Package ecosystem, npm, is the largest ecosystem of open-source libraries in the world. Open source (free) Built on Chrome's V8 Javascript engine Is "fast" because it uses small amount of threads to handle many clients via asynchronous calls. However, it is only effective if each work/task is small. 2) Javascript background Generally considered an Interpreted language, but modern Javascript engines compiles it as well (since 2009). Javascript internally compiled by V8 with just-in-time (JIT) compilation to speed up execution. While it might take a bit more to have Javascript ready, once done, it's more performant than purely

[Threading - Java] Future and FutureTask

Image
1) Overview Future interface provides method to check if computation is complete, wait for completion and retrieve the results of the computation.  The result is retrieved using Future's get() method if computation is completed. If it is not, then it blocks until it is completed. FutureTask is an implementation of Future interface and RunnableFuture Interface - you can use FutureTask as Runnable to be submitted to ExecutorService. 2) Example Create two task. After one is completely executed, then after waiting 2000ms, second task starts. Output FutureTask1 output=FutureTask1 is complete Waiting for FutureTask2 to complete Waiting for FutureTask2 to complete Waiting for FutureTask2 to complete Waiting for FutureTask2 to complete FutureTask2 output=FutureTask2 is complete Both FutureTask Complete Resource https://www.geeksforgeeks.org/future-and-futuretask-in-java/

[Redis] Redis Cluster vs Redis Sentinel

1) In-Depth Guides to Redis Cluster and Sentinel For more in-depth explanation for Redis Cluster or Redis Sentinel, please read my other posts: Sentinel:  https://notafraidofwong.blogspot.com/2019/05/redis-redis-sentinel.html Cluster:  https://notafraidofwong.blogspot.com/2019/05/redis-redis-cluster.html 2) Deciding between Redis Sentinel and Redis Cluster Similarities : Both solutions provide high availability for your system. 2.1) Redis Cluster 2.1.1) Pros Shards data across multiple nodes Has replication support Has built-in failover of master 2.1.2) Cons Not every library supports it May not be as robust (yet) as Standalone Redis or Sentinel Setup and maintenance is more complicated 2.2) Redis Sentinel 2.2.1) Pros Automatically selects new master in case of failure Easy to setup, (seems) easy to maintain 2.2.2) Cons No data sharding; master might be overutilized. Another distributed system to maintain. 3) So how do I decide? In

[Redis] Redis Cluster

Image
1) Overview Provides better Scalability and Load Balancing - Redis Cluster allows your Redis data to be automatically sharded across multiple Redis nodes. High Availability - Cluster provides ability to continue operations when a subset of the nodes experience failures or unable to communicate with the rest of the cluster; however, large scale failures may stop the operation. 2) Data Sharding Data Sharding is a method to break up a big database into smaller parts. The reason for data sharding is that, after a certain scale point, it is cheaper and more practical to scale horizontally (by adding more machines) than to grow it vertically (by adding buffier servers or adding more CPU/ram to servers). 3) How it works? 3.1) Redis Cluster TCP ports Unlike sentinel, there is no dedicated monitoring. Instead, every cluster node have 2 TCP connections open. The first one is a standard redis TCP used to serve clients. Another is a cluster bus port (n

[Redis] Redis Sentinel

1) Overview A system designed to help manage Redis instance. Primary purpose is to provide high availability system, by monitoring, notifying and providing instances failover. Does this by monitoring master and slave nodes. When master node is down, sentinels will coordinate to promote a slave node into master. 2)   4 main tasks 2.1) Monitoring Check if your master and slaves instances are working as expected 2.2) Notification Notify other program or system administrator via an API, when something goes wrong with the monitored instances 2.3) Automatic Failover On master failure, sentinel promotes one of the slaves to master, then make the other additional salves use the new master. 2.4) Configuration Provider Sentinel acts as source of authority for clients service delivery. Clients connect to Sentinels in order to ask for address of current Redis master responsible for given service. If a failover occurs, Sentinels will report the new address

[Redis] Redis Persistence

1) Persistence Methods 1.1) No Persistence, dataset in memory  You can disable persistence if you want your data to exist as long as server is running. 1.2) Redis Database File (RDF) Snapshot of dataset per time interval 1.3) Append Only File (AOF) Log every write operation, which will be played again at server startup, reconstructing the original dataset. 1.4) Hybrid (RDB & AOF) Note that when Redis restarts, the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete. 2) RDB (Redis Database File) Allow you to snapshot dataset every N seconds if there are at least M changes. 2.1) How does it work? (also called snapshotting) When Redis create a snapshot, this happens: Redis forks. New child process starts in addition to the parent process. The child process begins to write the dataset to a temporary RDB file. Once the child process finishes writing to the new RDB file, it replaces the old one. 2.2) Commands -

[Java] Volatile

1) Volatile Used as an indicator to Java Compiler and Thread to not cache value of this variable and always read it from the main memory . In multi-threaded environment, threads might cache variables locally. Volatile (or synchronization) ensures the variable is read from main memory. Makes variable atomic by implementation.  Only possible with variables ; cannot be used with method or class (illegal operation) Guarantees visibility and ordering; write to any volatile variable happens before any read. Prevents compiler or JVM form reordering code or moving away them away form synchronization barrier.  2) Example private boolean bExit ; while (! bExit ) { checkUserPosition (); updateUserPosition (); } The focus here is the variable, bExit.  One Thread (Game Thread) can cache the value of bExit instead of getting it form main memory everytime. If in between, any other thread (Event handler thread) changes the value; it would be not be vis

[Synchronization - Java] Reentrant Locks vs Synchronized

Image
1) Reentrant Lock Implementation of Lock interface.  Mutually-exclusive lock (similar to synchronized) with extended feature like fairness. Lock is acquired via lock() and is held until unlock() 2) Reentrant Lock vs Synchronized 2.1) Blocking Indefinitely vs Temporarily/Selectively. Main difference is reentrant lock has the ability to try to lock interruptibly and with timeout. tryLock() allows program to try the lock without being blocked. In summary, Thread doesn't need to blocked indefinitely, whereas synchronized block block indefinitely. 2.2) Fairness Supported by Reentrant lock, but not by synchronized. 2.3) Get List of Threads waiting for Lock Reentrant locks have ability to see all threads waiting on the lock; accessible via API. 3) Cons of Reentrant Lock 3.1) Readability Requires wrapping in try-finally block, which makes code unreadable and hides business logic. 3.2) Potential Bugs Programmer is now responsible for acquiring and releasing loc

[Threads - Java] What's the max number of threads I should use?

1) Start Approach The short and quick answer is to have # threads = CPU/Cores. Why have a thread if there's nothing to run it? Therefore, one thread per process/core will maximize processing power and minimize context switching. This assumption is a good starting approach . public static final int THREADS = Runtime . getRuntime (). availableProcessors (); 2) Detailed Approach: CPU-bound or I/O bound? But the more correct answer is - it depends if your task is  CPU-bound or I/O-bound? 2.1) I/O-bound If your thread has to wait for network packet or disk block, then CPU time is wasted waiting. In this case, you will want to consider having more threads per process, as your program can work on another thread instead of just waiting. However, there is an overhead of adding threads and additional work that will get accomplished. 2.2) CPU-bound But if your thread is CPU heavy, then a 1:1 correlation makes more sense, because adding more threads will only slow

[Cache] Caching Strategies

1) Lazy Loading Load data into cache only when necessary. Ask Cache - if unavailable, Ask Database. 1.1) Cache Hit When your data is in cache and not expired: Application requests data from the cache. Cache returns data to the application. 1.2) Cache Miss Application requests data from the cache. Cache doesn't have requested data and returns null. Application requests and receives data from database. Application updates cache with requested data for quicker access next time. 1.3) Pros & Cons 1.3.1) Pros Only requested data is cached. Most data are never requested, lazy loading avoids filling up your cache with data that isn't requested. Node Failures are not fatal When node fails and is replaced, new node continues to function as requests are being made. (as opposed to that in Write Through) 1.3.2) Cons Cache Miss Penalty Each cache miss results in 3 trips: Initial data request to cache Database query Writing data to node S

[Database] Simple SQL vs NoSQL

1. Schema SQL stores data in tables. NoSQL stores in different methods (key/value, document, columnar and graph) 2. Querying SQL databases uses regular SQL statements. NoSQL uses unQL (Unstructured Query Language) 3. Scalability SQL databases are vertically scalable (beef the same server hardware). NoSQL databases are horizontally scalable (add more servers) 4. Reliability SQL databases are ACID friendly, NoSQL sacrifices ACID property for performance reasons. 4.1) What is ACID? Set of properties of database transactions to guarantee validity even in the event of errors, power failures.etc. Atomicity Consistency Isolation Durability 4.1.1) Atomicity Transactions are often composed of multiple statements.  Atomicity guarantees each transaction is treated a single 'unit', which either succeeds completely or fails completely.  If any of the statements in a transaction fails to complete, the entire transaction fails and database is left uncha

[NoSql] 4 Types of NoSQL Database

1. Key-Value Stores Popular db: Redis, Dynamo 2. Document Databases 2.1) Features Data are stored as 'object' in some standard encoding, such as json or xml. Sub-class of key-value store. No schema (because this is NoSql) Key-Document Relationship The key is used to retrieve the document from the database. Query by content/metadata Beyond key-to-document lookup, database offers an API or query language that enables users to retrieve documents based on content/metadata. For example, you may want a query that retrieves all documents with a certain field set to a certain value. This is what uniquely differs from key-value stores. 2.2) Popular Databases MongoDB, CouchDB 3. Column Databases 3.1) Features Instead of storing data in rows, these databases store data by column. 3.2) When to use? Highly dependent on use case: During retrieval, does requested data have attributes that we don't always need? Good for Business Analytics o

[Java - Synchronization] Exchanger

Image
1) Intro Exchanger is a synchronization point at which threads can swap elements. "Swap" meaning each thread passes one element and receives the partner's element in return. Whenever a thread arrives at the exchange point or method, it must wait for the other thread to arrive. There is also an overloaded version, exchange(V x, long timeout, TimeUnit unit).  If the corresponding pairing thread does not arrive at the exchange point within timeout specified, the waiting Thread throws a java.util.concurrent.TimeoutException. In application, exchangers can be useful in genetic algorithms and neural network. 2) Example 2.1) Code 2.2) Output Consumer now has Ready Queue Producer now has Empty Queue In the above example, we create an Exchanger Object of the type String. The Producer thread produces a “filled queue”and exchanges it with the Consumer thread for an “empty queue”.(The filled and empty queue mentioned here are just dummy string obj