[{"content":"Why I Ended Up Customizing BaekjoonHub As I started using Baekjoon more seriously, I discovered the BaekjoonHub extension.\nIt was convenient because it could automatically upload solutions to GitHub, but there were still several pain points for my actual workflow.\nSo I first checked issues in the official repository.\nI found that people with similar concerns already existed, and there were PRs proposing related features. However, at that time, it did not look like those features would be merged soon, so I decided to customize what I needed instead of waiting.\nOriginal repository: https://github.com/BaekjoonHub/BaekjoonHub\nCustom repository: https://github.com/0AndWild/baekjoonhub_custom\nPain Points I Had Upload paths were based on the repository root, which made it hard to fit my project structure Tier paths were not granular enough, making problem organization less clean Problem directory names were awkward to use as package/path names I sometimes had to manually align Java file names with the runtime entry point (Main.java) It was difficult to organize and upload already solved problems in one shot What I Changed in My Custom Version Added Base Directory support Split tier paths into granular levels like Bronze/V Normalized problem directory names Fixed Java file name to Main.java and auto-inserted package Added bulk upload for all accepted Baekjoon problems Wrap-up This customization was focused less on adding many new features,\nand more on reducing repetitive friction in the actual solve-and-organize workflow.\nSince similar requests already existed in the official repository, I think this customization direction could be useful for others with the same pain points.\n","date":"2026-02-20T22:35:24+09:00","image":"/posts/260220_baekjoonhub/featured.png","permalink":"/en/posts/260220_baekjoonhub/","title":"Customizing the BaekjoonHub Chrome Extension"},{"content":"Introduction Following the previous post about bidirectional communication before WebSocket, I’m going to read the WebSocket protocol spec, RFC 6455, and cover WebSocket in more detail. Examples used for detailed explanation were generated with the help of AI.\nRFC 6455 (The WebSocket Protocol) The goal of the WebSocket protocol is to provide a mechanism for browser-based applications that need bidirectional communication with a server, without relying on multiple HTTP connections (e.g., XMLHttpRequest, long polling). WebSocket is composed of an initial handshake followed by basic message framing over TCP.\nIn the past, applications that needed bidirectional communication between a client and a server often “abused” HTTP: they polled for server updates and then sent notifications via separate HTTP calls on top of that (RFC 6202).\nOne way to solve this is to use a single TCP connection for bidirectional traffic. WebSocket supports this approach. WebSocket API spec: https://websockets.spec.whatwg.org/\nWebSocket was designed to replace bidirectional communication techniques that used HTTP as a transport layer in order to benefit from existing infrastructure such as proxies, filtering, and authentication. Existing techniques were implemented as compromises between efficiency and reliability, because HTTP was not originally built for bidirectional communication.\nThe WebSocket protocol attempts to address the goals of existing bidirectional HTTP techniques within HTTP infrastructure environments. So it is designed to work over ports 80 and 443, and to support HTTP proxies and intermediaries—even if that adds complexity in today’s environment.\nThat said, this design does not limit WebSocket to HTTP. The spec notes that future implementations could use a simpler handshake over a dedicated port without reinventing the entire protocol.\nThe document emphasizes this because the traffic patterns of interactive messaging often do not resemble standard HTTP traffic, which can cause abnormal load on some infrastructure components.\nProtocol Overview The WebSocket protocol consists of two parts: the handshake and data transfer.\n// Client handshake request GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Origin: http://example.com Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13 // Server handshake response HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= Sec-WebSocket-Protocol: chat The client follows the Request-Line format and the server follows the Status-Line format. (RFC 2616)\nOnce both sides send the handshake and it succeeds, data transfer begins. This is a bidirectional communication channel where each side can send data independently, without waiting for the other side to “request” it.\nNow the client and server exchange data in units called “messages” in the WebSocket spec.\nA WebSocket message may not map 1:1 to frames at a particular network layer. The reason is that intermediate devices can coalesce fragmented messages or do the opposite (split them further).\nEach frame that belongs to the same message contains the same type of data. Broadly, there are textual data, binary data, and control frames (not application data; e.g., protocol-level signaling such as closing the connection).\nThe WebSocket protocol defines six frame types and reserves ten for future use.\nOpcode Type Name Description 0x0 Data Continuation Continuation of the previous frame’s payload 0x1 Data Text UTF-8 text data 0x2 Data Binary Binary data 0x3~0x7 Data Reserved Reserved for future data frame extensions (5) 0x8 Control Close Request to close the connection 0x9 Control Ping Heartbeat / liveness check 0xA Control Pong Response to Ping 0xB~0xF Control Reserved Reserved for future control frame extensions (5) “A fragmented message can be coalesced, or the opposite can happen” That line can be confusing, so here’s an example.\nFirst, why does fragmentation happen? Common reasons include:\nMTU (Maximum Transmission Unit) limits (network packet size limit, typically ~1500 bytes) Packets larger than the MTU may be split into smaller pieces on the path (fragmented). Think of it like a height limit in a tunnel.\nThe server may be configured to send in chunks of a specific size.\nSimilar to (1): intermediaries such as proxies, load balancers, or API gateways may split large frames.\nMemory efficiency: splitting avoids buffering a huge payload all at once.\nBack to the main point—let’s look at both coalescing and splitting.\n1. Coalesced (original: sent in fragments) [Frame 1: FIN=0, opcode=text, \u0026#34;Hello \u0026#34;] [Frame 2: FIN=0, opcode=continuation, \u0026#34;World\u0026#34;] [Frame 3: FIN=1, opcode=continuation, \u0026#34;!\u0026#34;] (an intermediary receives the three frames and forwards them as one) [Frame: FIN=1, opcode=text, \u0026#34;Hello World!\u0026#34;] 2. Split (original) [Frame: FIN=1, \u0026#34;Hello World!\u0026#34;] (an intermediary splits the original frame) [Frame 1: FIN=0, \u0026#34;Hello \u0026#34;] [Frame 2: FIN=1, \u0026#34;World!\u0026#34;] As another example, assume we have a Spring Boot service (acting as a client) receiving real-time stock data via WebSocket from a financial server (acting as the server). Let’s walk through it.\nAssume the financial server sends a single message split across multiple frames (and they are not coalesced in the middle): [WebSocket Frame 1] FIN: 0 (not done yet) Opcode: 0x1 (text) Payload: {\u0026#34;stockCode\u0026#34;:\u0026#34;005930\u0026#34;,\u0026#34;price\u0026#34;:71500,\u0026#34;vol [WebSocket Frame 2] FIN: 0 (not done yet) Opcode: 0x0 (continuation) Payload: ume\u0026#34;:50000,\u0026#34;time\u0026#34;:\u0026#34;09:00:01\u0026#34;,\u0026#34;seller\u0026#34;:\u0026#34; [WebSocket Frame 3] FIN: 1 (this is the last) Opcode: 0x0 (continuation) Payload: foreign\u0026#34;,\u0026#34;buyer\u0026#34;:\u0026#34;institution\u0026#34;,...} @Component public class WebSocketHandler extends TextWebSocketHandler { @Override protected void handleTextMessage(WebSocketSession session, TextMessage message) { // message.getPayload() is already a complete message here // You receive the full payload: {\u0026#34;stockCode\u0026#34;:\u0026#34;005930\u0026#34;,\u0026#34;price\u0026#34;:71500,\u0026#34;volume\u0026#34;:50000,...} String payload = message.getPayload(); } } The reason payload is a complete message is that Spring buffers Frame 1 and 2 (because FIN=0), concatenates them, and when it sees Frame 3 with FIN=1, it combines the buffered data and then invokes handleTextMessage().\nOne thing I felt while studying this is that, because frameworks implement so many things so well for us, it’s easy to ship features without really understanding what’s happening at the network level.\nTo summarize the “big picture” of the WebSocket protocol so far:\nIt consists of two parts: Handshake and Data Transfer. After the handshake succeeds, the connection becomes bidirectional and data transfer happens. Data is exchanged as a unit called a message. A message may not match a specific network-layer frame, because intermediaries can split or coalesce it. WebSocket defines 16 opcodes total: 3 data frames + 3 control frames + 5 reserved data + 5 reserved control. Next, let’s look at how the Opening Handshake works.\nOpening Handshake Header Purpose Upgrade Request protocol upgrade Connection Request connection upgrade Sec-WebSocket-Key Security key (Base64-encoded random 16 bytes) Sec-WebSocket-Version Protocol version (currently 13) Sec-WebSocket-Protocol Subprotocol (optional) Origin Origin info sent by the browser The initial handshake is designed to be compatible with HTTP-based servers and intermediaries, so that both HTTP clients and WebSocket clients can use a single port to talk to the same server. Therefore, the WebSocket client’s handshake is an HTTP Upgrade request.\nHeader fields in the handshake can be sent in any order, so the order in which different header fields are received is not important. The client includes the host name in the Host header field so that both the client and server can confirm they agree on the host being used.\nAdditional header fields are used to select options in the WebSocket protocol. Common options include a subprotocol selector (Sec-WebSocket-Protocol), the list of extensions the client supports (Sec-WebSocket-Extensions), and the Origin field. The Sec-WebSocket-Protocol request header indicates subprotocols (application-level protocols layered on top of WebSocket) that the client is willing to use. The server may select one of the acceptable protocols—or none—and it echoes the selected value in the handshake response.\n// Client handshake request GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Origin: http://example.com Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13 // Server handshake response HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= Sec-WebSocket-Protocol: chat \u0026lt;- the server echoes the protocol it selected When the server receives the client’s handshake, it must include two key pieces of information in the response. The first is Sec-WebSocket-Accept.\nHow to compute Sec-WebSocket-Accept The Sec-WebSocket-Accept field indicates whether the server is willing to accept the WebSocket connection.\nTake Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Concatenate: dGhlIHNhbXBsZSBub25jZQ== + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 Compute SHA-1 hash: 0xb3 0x7a 0x4f 0x2c 0xc0 0x62 0x4f 0x16 0x90 0xf6 0x46 0x06 0xcf 0x38 0x59 0x45 0xb2 0xbe 0xc4 0xea Base64 encode: \u0026quot;s3pPLMBiTxaQ9kYGzzhZRbK+xOo=\u0026quot; Include Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= in the response By returning Sec-WebSocket-Accept and status code 101, the server tells the client it accepted the handshake.\n// If the server accepts the client\u0026#39;s handshake and the connection is established normally, the status code is 101. // Any status code other than 101 means the WebSocket handshake did not complete. HTTP/1.1 101 Switching Protocols Based on this response, the client checks that Sec-WebSocket-Accept matches the expected value, required header fields are present, and the HTTP status code is 101. If any of these checks fail, no WebSocket frames are sent.\nTo recap:\nThe handshake begins as an HTTP protocol upgrade request. Header order does not matter. The server response must include Sec-WebSocket-Accept computed from Sec-WebSocket-Key, along with HTTP status 101. Next, let’s look at how a WebSocket connection is closed.\nClosing Handshake WebSocket also uses a handshake-style process when closing the connection.\nOne side sends a control frame indicating close. The other side responds with a close control frame. After both sides have sent and received a close frame, the TCP connection is closed. The reason WebSocket uses its own closing handshake is to complement the TCP closing handshake (FIN/ACK). If there are intercepting proxies or other intermediaries, TCP closing handshake signals are not always reliably end-to-end, according to the spec.\nWhat does “not reliably end-to-end” mean here? Using the TCP closing handshake (4-way handshake) can lead to data loss.\n// TCP Closing handshake Client Server | | |-------- FIN -----------\u0026gt;| \u0026#34;I\u0026#39;m done sending\u0026#34; |\u0026lt;------- ACK ------------| \u0026#34;OK\u0026#34; |\u0026lt;------- FIN ------------| \u0026#34;I\u0026#39;m done too\u0026#34; |-------- ACK -----------\u0026gt;| \u0026#34;OK\u0026#34; | | The core issue described is that with intermediaries (proxies, load balancers), TCP close may not propagate end-to-end as expected.\nLet’s look at an example if you rely only on the TCP closing handshake.\nHere’s a quick reference for flags:\nFlag Purpose When used SYN Start connection Start 3-way handshake ACK Acknowledge receipt Included in most packets FIN Graceful close When no more data to send RST Abort / reset Error cases, abnormal close // Reproducing a scenario: the client closes the socket while the stock server is still sending data. // Stock Server side session.sendMessage(new TextMessage(stockData1)); // delivered session.sendMessage(new TextMessage(stockData2)); // delivered, but the client hasn\u0026#39;t read it yet // Client side - suddenly closes the socket socket.close(); // stockData2 is still in the receive queue when closing What happens at the OS level:\n[Client receive queue] +----------------------+ | stockData2 (unread) | ← the app hasn\u0026#39;t consumed it yet +----------------------+ socket.close() is called ↓ OS: \u0026#34;You want to close with unread data still pending?\u0026#34; ↓ OS sends an RST packet (instead of a graceful FIN) ↓ Stock Server receives RST ↓ Stock Server\u0026#39;s recv() fails (Connection reset by peer) RST vs FIN FIN: \u0026#34;I\u0026#39;m done. Let\u0026#39;s close cleanly.\u0026#34; RST: \u0026#34;Emergency! Reset the connection—something went wrong!\u0026#34; When the stock server receives an RST:\nIt discards any remaining data it was about to send. It logs errors. Connection state becomes harder to reason about. At this point you might wonder: doesn’t the same issue apply to HTTP request/response too?\nYes, the same issue can happen. But it usually doesn’t matter much:\nThe request/response is already complete. The connection is short-lived. The next request can use a new connection. HTTP example Client Proxy Server │ │ │ │── GET /stock ─────────\u0026gt;│── GET /stock ─────────\u0026gt;│ │ │ │ │\u0026lt;── 200 OK + data ──────│\u0026lt;── 200 OK + data ──────│ │ │ │ │── FIN ────────────────\u0026gt;│ │ │ │ (may or may not forward) │ │ │ HTTP is stateless, and once a request/response ends, the connection’s job is done—so an abnormal close is typically not a big deal.\nWebSocket example Client Proxy Server │ │ │ │══ WebSocket connection (persistent) ════════════│ │ │ │ │\u0026lt;── stockData1 ─────────│\u0026lt;── stockData1 ─────────│ │\u0026lt;── stockData2 ─────────│\u0026lt;── stockData2 ─────────│ │\u0026lt;── stockData3 ─────────│\u0026lt;── stockData3 ─────────│ │ ... │ ... │ │ │ │ │── FIN ────────────────\u0026gt;│ │ │ │ (not forwarded) │ │ │ │ │ │\u0026lt;── stockData4 ─────────│ ← server keeps sending! │ │\u0026lt;── stockData5 ─────────│ │ │\u0026lt;── stockData6 ─────────│ │ │ │ │ │ buffered in proxy │ │ │ or lost somewhere │ But WebSocket is stateful and the connection stays open while data flows continuously, so this becomes a real problem.\nThat’s why WebSocket closes via its own handshake. The close frame is an application-layer message that intermediaries must forward. Unlike TCP FIN, a proxy cannot arbitrarily handle it.\nClient Proxy Server │ │ │ │── Close Frame ────────\u0026gt;│── Close Frame ────────\u0026gt;│ ← application level │ │ │ │ │ Server: \u0026#34;closing\u0026#34; │ │ │ unsubscribe │ │ │ stop sending │ │ │ │ │\u0026lt;── Close Frame ────────│\u0026lt;── Close Frame ────────│ │ │ │ │── FIN ────────────────\u0026gt;│── FIN ────────────────\u0026gt;│ ← then TCP closes WebSocket design philosophy: minimal framing Core principle RFC 6455 states WebSocket’s design principle like this:\n\u0026ldquo;The WebSocket Protocol is designed on the principle that there should be minimal framing\u0026rdquo;\nThe framing WebSocket provides is for exactly two purposes:\nStream → message conversion: TCP is a continuous byte stream, but applications think in “messages” Text vs binary distinction: whether a payload is UTF-8 text or arbitrary binary All other metadata (message type, routing, authentication, etc.) is intentionally left to the application layer.\nThe TCP problem: no message boundaries TCP is a byte-stream protocol. Data flows continuously like water through a pipe; there’s no notion of “this is the end of a message.”\nSender: send(\u0026#34;Hello\u0026#34;) send(\u0026#34;World\u0026#34;) Inside the TCP pipe: [H][e][l][l][o][W][o][r][l][d] ← all contiguous What the receiver may actually get: recv() → \u0026#34;Hel\u0026#34; recv() → \u0026#34;loWor\u0026#34; recv() → \u0026#34;ld\u0026#34; The receiver can’t tell where “Hello” ends and “World” begins.\nWebSocket’s solution: restore boundaries with frames WebSocket wraps each message in frames to define boundaries:\nSender: ws.send(\u0026#34;Hello\u0026#34;) ws.send(\u0026#34;World\u0026#34;) After WebSocket framing: [FIN=1, len=5, \u0026#34;Hello\u0026#34;][FIN=1, len=5, \u0026#34;World\u0026#34;] ├─── Frame 1 ───────┤├─── Frame 2 ────────┤ Receiver: onMessage(\u0026#34;Hello\u0026#34;) ← receives exactly the original message units onMessage(\u0026#34;World\u0026#34;) What “minimal” means This is all the information that appears in the WebSocket frame header:\nField Purpose FIN Whether this is the last frame of the message Opcode Text (0x1) vs Binary (0x2) vs Control (0x8, 0x9, 0xA) Length Payload length Mask Whether masking is used (security) Compared to HTTP headers, the difference is obvious:\nHTTP headers (hundreds of bytes): Content-Type: application/json Content-Length: 42 Authorization: Bearer xxx X-Request-ID: abc123 Cache-Control: no-cache ... and so on WebSocket frame header (2~14 bytes): [FIN + opcode][MASK + length] That\u0026#39;s it. What WebSocket does not do The “minimal framing” philosophy also means: everything else is your job.\nExample of an actual client message: { \u0026#34;type\u0026#34;: \u0026#34;SUBSCRIBE\u0026#34;, \u0026#34;channel\u0026#34;: \u0026#34;stock.005930\u0026#34;, \u0026#34;userId\u0026#34;: \u0026#34;gun0\u0026#34;, \u0026#34;token\u0026#34;: \u0026#34;abc123\u0026#34; } What WebSocket knows:\n\u0026ldquo;This is a text frame and its length is 120 bytes.\u0026rdquo; What WebSocket doesn’t know:\nWhat type means How to route by channel How to authenticate/authorize using userId and token All of that must be implemented by the application layer.\nWhy you use subprotocols like STOMP With plain WebSocket only:\n@OnMessage public void onMessage(String message) { // message is just a string // you must parse it and implement routing / semantics yourself JSONObject json = new JSONObject(message); String type = json.getString(\u0026#34;type\u0026#34;); if (type.equals(\u0026#34;SUBSCRIBE\u0026#34;)) { // implement subscribe logic } else if (type.equals(\u0026#34;UNSUBSCRIBE\u0026#34;)) { // implement unsubscribe logic } else if (type.equals(\u0026#34;SEND\u0026#34;)) { // implement message sending logic } } If you layer STOMP on top:\n@MessageMapping(\u0026#34;/stock/{stockCode}\u0026#34;) public void handleStock(@DestinationVariable String stockCode, StockRequest request) { // message type and routing are already handled } What WebSocket adds on top of TCP RFC 6455 clearly defines WebSocket’s role:\n1. Web Origin-based security model Origin: http://example.com In browser environments, this tells the server “where the script came from,” giving the server a basis to reject cross-origin requests.\n2. Addressing and protocol naming GET /chat HTTP/1.1 Host: server.example.com Sec-WebSocket-Protocol: stomp, mqtt You can provide multiple services on a single IP + port:\nDistinguish endpoints via path (/chat, /notifications) Virtual hosting via the Host header Negotiate subprotocols via Sec-WebSocket-Protocol 3. Framing mechanism The RFC uses an interesting phrasing:\n\u0026ldquo;layers a framing mechanism on top of TCP to get back to the IP packet mechanism that TCP is built on, but without length limits\u0026rdquo;\nLayer Property IP Packet-based, clear boundaries, size limits (~1500 bytes) TCP Stream-based, no boundaries, no size limits WebSocket Frame-based, clear boundaries, no size limits TCP stitches IP packets together into a continuous stream, and WebSocket restores application-level message boundaries.\n4. Proxy-friendly Closing Handshake TCP FIN/ACK alone can cause data loss when a proxy is involved:\n[Client] ----data----\u0026gt; [Proxy] ----data----\u0026gt; [Server] [Client] \u0026lt;---FIN------ [Proxy] [Server] ← proxy may cut off independently WebSocket Close frames negotiate closure at the application layer, which is safer:\n[Client] ---Close Frame---\u0026gt; [Proxy] ---Close Frame---\u0026gt; [Server] [Client] \u0026lt;--Close Frame---- [Proxy] \u0026lt;--Close Frame---- [Server] [Client] -------TCP FIN-------\u0026gt; ... -------TCP FIN-------\u0026gt; [Server] “As close to raw TCP as possible” A key sentence from the RFC:\n\u0026ldquo;Basically it is intended to be as close to just exposing raw TCP to script as possible given the constraints of the Web.\u0026rdquo;\nBrowsers cannot allow JavaScript to open raw TCP sockets directly (for security reasons). WebSocket aims to provide an experience as close to TCP as possible within those constraints.\nWhat WebSocket does not add:\nFeature Why Message IDs Up to the application Request-response mapping It’s a bidirectional stream Retransmission / ordering TCP already provides this Compression Raw by default (possible via extensions) Authentication Typically handled in the HTTP handshake Routing Typically handled by a subprotocol Coexisting with HTTP infrastructure \u0026ldquo;It\u0026rsquo;s also designed in such a way that its servers can share a port with HTTP servers\u0026rdquo;\nThis is a very practical design decision:\nPort 80/443 │ ├── GET /api/users HTTP/1.1 → handled as normal HTTP ├── GET /index.html HTTP/1.1 → handled as normal HTTP └── GET /ws HTTP/1.1 → WebSocket upgrade Upgrade: websocket Benefits:\nCan pass through existing load balancers, proxies, and firewalls No need to open extra ports Can share TLS certificates This is why the handshake takes the form of an HTTP Upgrade request. The RFC also mentions:\n\u0026ldquo;the design does not limit WebSocket to HTTP, and future implementations could use a simpler handshake over a dedicated port without reinventing the entire protocol\u0026rdquo;\nIn other words, HTTP compatibility is a choice for today’s web infrastructure, not the essence of the protocol.\nExtensibility \u0026ldquo;The protocol is intended to be extensible; future versions will likely introduce additional concepts such as multiplexing.\u0026rdquo;\nReserved for extensions:\nReserved item Purpose RSV1, RSV2, RSV3 bits Per-frame extension flags Opcode 0x3-0x7 Additional data frame types Opcode 0xB-0xF Additional control frame types Sec-WebSocket-Extensions header Extension negotiation A real-world extension example — permessage-deflate:\nSec-WebSocket-Extensions: permessage-deflate; client_max_window_bits This is a message compression extension; it uses the RSV1 bit to mark “this frame is compressed.”\nAnalogy TCP = a highway\nCars (bytes) keep flowing No clear boundary lines showing where one “group” ends and the next begins WebSocket = container trucks\nGoods (messages) are packed into containers (frames) The truck doesn’t know what’s inside; it just carries the container safely It only needs minimal info like container size STOMP = a logistics system\nAdds manifests: “this goes to A, that goes to B” Categorizes goods by type Adds tracking and routing WebSocket is like the container truck: it delivers payloads safely as chunks, while higher-level protocols such as STOMP manage what’s inside.\nSummary WebSocket’s design philosophy can be summarized as: “do the minimum, delegate the rest.”\nWhat WebSocket does What WebSocket doesn’t do Define message boundaries Message types / routing Distinguish text vs binary Auth / authorization Keep the connection alive Reconnect logic Ping/Pong heartbeat Business logic Origin-based security basis Application-level security Because of this philosophy, WebSocket is lightweight and general-purpose. You can layer STOMP, Socket.IO, or your own custom protocol on top depending on your needs.\nThat’s what I learned from reading RFC 6455 and digging into how WebSocket works. In the next post, I’m thinking of exploring how WebSocket is implemented in Spring Boot.\n","date":"2026-01-21T15:38:26+09:00","image":"/posts/260210_websocket/featured.png","permalink":"/en/posts/260210_websocket/","title":"What is WebSocket? (RFC 6455)"},{"content":"Introduction While working on a mock investment project, I needed to serve real-time stock price fluctuation data and ended up using WebSocket technology. I\u0026rsquo;ve known about WebSocket, a bidirectional real-time communication technology, for some time, but never dug deep into it. I took this opportunity to dive in thoroughly.\nWhile reading RFC 6455, the WebSocket technical document, I found a section that referenced RFC 6202 (Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP) to explain past processing methods and their problems. Today, I\u0026rsquo;ll read RFC 6202 to learn how bidirectional communication was handled before WebSocket emerged, and explore the thoughts and best practices of that time.\nRFC 6202 (Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP) This document, written in April 2011, discusses HTTP long polling and HTTP streaming as well-known issues and best practices for bidirectional HTTP communication at that time. It also acknowledges that both HTTP long polling and HTTP streaming are extensions of HTTP, and that the HTTP protocol was not designed for bidirectional communication. The authors note that this document neither recommends using these two methods nor discourages their use, but rather focuses on discussing good use cases and issues.\nFundamentally, HTTP (Hypertext Transfer Protocol: RFC 2616) is a request/response protocol. HTTP defines three entities: clients, proxies, and servers. A client creates a connection to send an HTTP request to a server, and the server accepts the connection to process the HTTP request by returning a response. Proxies are objects that can intervene in forwarding requests and responses between clients and servers.\nBy default, the standard HTTP model doesn\u0026rsquo;t allow servers to send asynchronous events to clients because servers cannot initiate connections to clients first and cannot send unsolicited HTTP responses.\nSo to receive asynchronous events as quickly as possible, clients must periodically poll the server. This continuous polling forces request/response cycles even when there\u0026rsquo;s no data, consuming network resources and reducing application response efficiency as data queues up until the server receives the next polling request.\nHTTP long polling \u0026amp; HTTP streaming 1. HTTP long polling Traditional short polling is a technique where clients periodically send requests to the server to update data, but they receive empty responses even when there are no new events, or must wait until the next poll. This technique\u0026rsquo;s request cycle is determined by the delay time set by the client, and when this cycle is short (high polling frequency), it can cause unbearable burden on both server and network.\nIn contrast, long polling attempts to minimize message delivery delays and network resource usage by responding to requests only when specific events, states, or network timeouts occur.\nHTTP long polling life cycle Client creates an initial request and waits for a response Server withholds response until updates are available or a specific state or timeout occurs When updates become available, server sends a response to the client Immediately after receiving the response, client either creates a new long poll request right away or after an acceptable delay HTTP long polling issues Header overhead: Since every request/response is an HTTP message, HTTP headers always accompany it even if the data is small. For small data, headers can constitute a significant portion of data transmission. If the network MTU (Maximum Transmission Unit) can accommodate all information including headers in a single IP packet, the network burden isn\u0026rsquo;t significant. However, when small messages are frequently exchanged, the problem of transmission volume being large relative to actual data occurs. For example, it\u0026rsquo;s like sending a single sheet of paper (20g) in a delivery box (300g).\nMaximal latency: Even if the server wants to immediately send a new message right after sending a long poll response, it must wait until the client\u0026rsquo;s next request arrives. Average latency is close to 1 network transit, but in the worst case can extend up to 3 network transits (response-request-response), and considering TCP packet loss retransmission, even more can occur.\nConnection Establishment: Both short polling and long polling are criticized for frequently opening and closing TCP/IP connections. However, both polling mechanisms work well with persistent HTTP connections that can be reused.\nAllocated Resources: Operating systems allocate resources to TCP/IP connections and pending HTTP requests. HTTP long polling requires both TCP/IP connections and HTTP requests to remain open for each client. Therefore, when determining the scale of an HTTP long polling application, it\u0026rsquo;s important to consider resources related to both.\nGraceful Degradation: When servers or clients are overloaded, messages can queue up and multiple messages can be bundled in one response. Latency increases but per-message overhead decreases, naturally distributing the load.\nTimeouts: Long poll requests can have timeout issues because they must maintain a hanging state until the server has data to send.\nCaching: If intermediate proxies or CDNs cache responses, the problem of receiving old data instead of fresh data can occur. While clients or hosts have no way to inform HTTP intermediaries that long polling is in use, caching that could interfere with bidirectional flow can be controlled with standard headers or cookies. As a best practice, caching should always be intentionally suppressed in long polling requests or responses. Set the \u0026ldquo;Cache-Control\u0026rdquo; header to \u0026ldquo;no-cache\u0026rdquo;.\n2. HTTP Streaming The mechanism of HTTP streaming is to never terminate the request or disconnect the connection even after the server sends data to the client. This mechanism significantly reduces network latency because clients and servers don\u0026rsquo;t need to continuously initiate and disconnect connections.\nHTTP Streaming life cycle Client creates an initial request and waits for a response Server withholds response until updates are available or a specific state or timeout occurs When updates become available, server sends a response to the client After sending data, server continues step 3 without terminating the request or disconnecting HTTP Streaming issues Network Intermediaries: The HTTP protocol allows intermediaries (proxies, transparent proxies, gateways, etc.) to intervene in the process of transmitting responses from server to client. HTTP Streaming doesn\u0026rsquo;t work with these intermediaries.\nMaximal Latency: Theoretically 1 network transit, but in practice, connections must be periodically disconnected and reconnected to prevent unlimited growth in memory usage related to Javascript/DOM elements. Ultimately, like long polling, maximum latency is 3 network transits.\nClient Buffering: According to HTTP specifications, there\u0026rsquo;s no obligation to process partial responses immediately. While most browsers execute response JS, some only execute after buffer overflow occurs. Sending blank characters to fill the buffer can be used as a method.\nFraming Techniques: When using HTTP Streaming, multiple application messages can be transmitted in a single HTTP response. However, because intermediate objects like proxies can re-chunk chunk units, messages cannot be distinguished by chunk units. Therefore, separators must be separately defined at the application level. Long polling doesn\u0026rsquo;t have this problem because there\u0026rsquo;s one message per response.\nOther server-push mechanisms Besides the two mechanisms above, this section introduces Bayeux (4.1), BOSH (4.2), Server-Sent Events (4.3), etc. It covers recommendations when using the SSE mechanism, as follows:\nThe specification recommends disabling HTTP chunking. The reason is the same as the HTTP streaming issues explained above.\nIntermediate proxies can re-chunk chunks Some proxies can buffer entire responses Best Practices Summary Item Key Content Recommendation Connection Limit 6-8 per browser limit Use one for long poll, detect duplicates with cookies Pipelining Regular requests can get blocked behind long poll Check support before use, prepare fallback Proxies Starvation occurs when connections are shared Use async proxies, avoid connection sharing Timeouts Too high gets 408/504, too low wastes traffic 30 seconds recommended Caching Real-time data shouldn\u0026rsquo;t be cached Cache-Control: no-cache mandatory Security Vulnerable to Injection, DoS Input validation, connection limit 1. Limits to the Maximum Number of Connections Background The HTTP specification (RFC 2616) originally recommended that a single client maintain a maximum of 2 connections to a server. There are two reasons:\nPrevent server overload Prevent unexpected side effects in congested networks Recent browsers have increased this limit to 6-8, but limits still exist. The problem is that users quickly exhaust these connections when opening multiple tabs or frames.\nWhy is this a problem? Long polling occupies connections for a long time. If 3 tabs each open 2 long polls:\nTab 1: 2 long poll connections Tab 2: 2 long poll connections Tab 3: 2 long poll connections ───────────────────── Total 6 connections → Browser limit reached In this state, to send regular HTTP requests (images, API calls, etc.), you must wait until existing connections end. This is called connection starvation.\nRecommendations Client side:\nIdeally, limit long poll requests to one and have multiple tabs/frames share it However, sharing resources between tabs is difficult due to browser security models (Same-Origin Policy, etc.) Server side:\nMust use cookies to detect duplicate long poll requests from the same browser When duplicate requests are detected, don\u0026rsquo;t make both wait—immediately respond to one to release it [Wrong handling] Request 1: Waiting... Request 2: Waiting... ← Both waiting causes connection starvation [Correct handling] Request 1: Waiting... Request 2: Arrives → Immediately send empty response to Request 1 → Only Request 2 waits 2. Pipelined Connections What is pipelining? A feature supported in HTTP/1.1 that allows sending multiple requests consecutively without waiting for responses.\n[Without pipelining] Request 1 → Response 1 → Request 2 → Response 2 → Request 3 → Response 3 [With pipelining] Request 1 → Request 2 → Request 3 → Response 1 → Response 2 → Response 3 Advantages in Long Polling Useful when the server needs to send multiple messages in a short time. With pipelining, the server doesn\u0026rsquo;t need to wait for the client\u0026rsquo;s new request after responding. Requests are already queued.\nProblem: Regular requests get blocked There\u0026rsquo;s a critical problem with pipelining. If a regular request gets queued behind a long poll, it must wait until the long poll ends.\n[Pipeline queue] 1. Long poll request (waiting 30 seconds...) 2. Image request ← Wait until long poll ends 3. API request ← Wait until long poll ends This can delay page loading by 30 seconds.\nPrecautions HTTP POST pipelining is not recommended in RFC 2616 Protocols like BOSH or Bayeux pipeline POSTs while guaranteeing order with request IDs To use pipelining, must verify that clients, intermediate equipment, and servers all support it If not supported, must fallback to non-pipelined method 3. Proxies Compatibility with general proxies Long Polling: Works well with most proxies. Because it ultimately sends a complete HTTP response (when events occur or timeout).\nHTTP Streaming: Has problems. It relies on two assumptions:\nProxy will forward each chunk immediately → Not guaranteed Browser will immediately execute arrived JS chunks → Not guaranteed Reverse proxy problems Reverse proxies appear as the actual server from the client\u0026rsquo;s perspective, but play the role of forwarding requests to the real server behind them.\nClient → [Reverse Proxy] → Actual Server (Nginx, Apache, etc.) Both long polling and streaming work, but there are performance issues. Most proxies aren\u0026rsquo;t designed to maintain many connections for long periods.\nConnection Sharing Problem This is the most serious problem. Proxies like Apache mod_jk are designed to have multiple clients share a small number of connections.\n[Apache mod_jk connection pool: 8 connections] Client A\u0026#39;s long poll → Connection 1 occupied (30 seconds...) Client B\u0026#39;s long poll → Connection 2 occupied (30 seconds...) Client C\u0026#39;s long poll → Connection 3 occupied (30 seconds...) ... Client H\u0026#39;s long poll → Connection 8 occupied (30 seconds...) Client I\u0026#39;s regular request → No connection! Waiting... Client J\u0026#39;s regular request → No connection! Waiting... When all 8 connections are occupied by long polls, all other requests (whether long poll or regular) get blocked. This is called connection starvation.\nRoot cause: Synchronous vs Asynchronous Model Operation Long Poll Impact Synchronous One thread/connection per request Severe resource exhaustion Asynchronous Event-based, minimal resources per connection Minimal impact Synchronous examples: Apache mod_jk, Java Servlet 2.5 Asynchronous examples: Nginx, Node.js, Java Servlet 3.0+\nConclusion: When using long polling/streaming, must avoid connection sharing. HTTP\u0026rsquo;s basic assumption is \u0026ldquo;each request completes as quickly as possible,\u0026rdquo; but long poll breaks this assumption.\n4. HTTP Responses This is simple. Just follow standard HTTP.\nWhen server successfully receives a request, respond with 200 OK Response timing: event occurrence, state change, or timeout Response body includes actual event/state/timeout information Nothing special, just comply with HTTP specifications.\n5. Timeouts Dilemma Setting long poll timeout values is tricky:\nIf set too high:\nMay receive 408 Request Timeout from server May receive 504 Gateway Timeout from proxy Slow detection of disconnected network connections If set too low:\nUnnecessary requests/responses increase Network traffic waste Server load increase Experimental results and recommended values Browser default timeout: 300 seconds (5 minutes) Values that succeeded in experiments: up to 120 seconds Safe recommended value: 30 seconds Most network infrastructure (proxies, load balancers, etc.) don\u0026rsquo;t have timeouts as long as browsers. Intermediate equipment can disconnect first.\nRecommendations for network equipment vendors To be long polling compatible, timeouts must be set significantly longer than 30 seconds. \u0026ldquo;Significantly\u0026rdquo; here means several times the average network round-trip time or more.\n6. Impact on Intermediary Entities Transparency problem Long poll requests are indistinguishable from regular HTTP requests from the perspective of intermediate equipment (proxies, gateways, etc.). There\u0026rsquo;s no way to tell them \u0026ldquo;this is a long poll, so handle it specially.\u0026rdquo;\nThis can cause intermediate equipment to do unnecessary work:\nAttempt caching (shouldn\u0026rsquo;t cache real-time data) Apply timeouts (long poll is supposed to take long) Attempt connection reuse (long poll has long occupation time) Cache prevention The most important thing is cache prevention. If real-time data is cached, clients receive past data.\nHeader that must be set:\nCache-Control: no-cache This header must be included in both requests and responses. It\u0026rsquo;s a standard HTTP header, so most intermediate equipment understands and respects it.\n7. Security Considerations RFC 6202 is a document that describes existing usage patterns of HTTP, not proposing new features. Therefore, it doesn\u0026rsquo;t create new security vulnerabilities. However, there are security issues that exist in already deployed solutions.\n1. Injection attacks (Cross-Domain Long Polling) Problem situation:\nWhen using the JSONP method in cross-domain long polling, the browser executes JavaScript returned by the server.\n// Server response (JSONP) callback({\u0026#34;price\u0026#34;: 52300}); If the server is vulnerable to injection attacks, attackers can insert malicious code:\n// Response manipulated by attacker callback({\u0026#34;price\u0026#34;: 52300}); stealCookies(); The browser executes this as is.\nCountermeasures:\nThorough server-side input validation Use CORS and avoid JSONP Set Content-Type headers accurately 2. DoS (Denial of Service) attacks Problem situation:\nLong polling and HTTP streaming must maintain many connections for long periods. If attackers open a large number of long poll connections:\nAttacker → Opens 1,000 connections (each waiting 30 seconds) ↓ Server resource exhaustion → Normal users cannot receive service Regular HTTP requests end quickly, so resource occupation time per connection is short. But long poll is intentionally maintained for long periods, making it vulnerable to DoS.\nCountermeasures:\nLimit connections per IP Allow long poll only for authenticated users Apply rate limiting Use asynchronous servers (minimize resources per connection) Conclusion By reading RFC 6202, I learned how server push events were created in the past. It was good to learn in more detail about the knowledge and problems of polling, streaming, and SSE mechanisms that I already knew.\nThe organized thought from reading this document is that it\u0026rsquo;s a concept of extending the HTTP protocol rather than a new protocol, and because HTTP by design is not a protocol for bidirectional asynchronous communication, it felt like they abused it to achieve the purpose of server event push.\nIn the next post, I\u0026rsquo;ll organize the RFC 6455 WebSocket document and try to connect the problems from RFC 6202 with the development process.\n","date":"2026-01-12T21:18:15+09:00","image":"/posts/260112_before_websocket/featured.png","permalink":"/en/posts/260112_before_websocket/","title":"Bidirectional Communication Before WebSocket"},{"content":"Looking Back on 2025 In July and November 2024, my ulcerative colitis worsened to the point where there were no more treatment options available, leading to two surgeries to remove my entire colon. I needed time to recover, so I left my first company at the end of January 2025. Though I learned a lot during those 2 years and 2 months and had many regrets, that chapter came to a close.\nHaving worked part-time jobs and full-time since my student days, it felt awkward and anxiety-inducing at first to just rest at home without doing anything. However, I gradually adapted. While the aftermath of the surgery prevented me from doing anything active, I was able to play games I wanted to and heal from the mental struggles I had been through.\nLooking back, even during my recovery period, I obtained certifications for the Information Processing Engineer and SQLD, and continued working on side projects to maintain my programming skills. (I guess I just can\u0026rsquo;t completely rest and do nothing\u0026hellip;)\n2025 can be summarized with two words: \u0026lsquo;resignation\u0026rsquo; and \u0026lsquo;recovery\u0026rsquo;. There were many difficult times due to poor health, but I want to praise myself for enduring it all.\nPlans for 2026 As the new year begins, I\u0026rsquo;ve recovered significantly and am preparing to return to work. My areas of interest are the financial sector or e-commerce domain, and I\u0026rsquo;m preparing accordingly.\nWhile the future is always uncertain, I believe that good opportunities will come with steady preparation.\n","date":"2026-01-01T23:03:28+09:00","image":"/posts/260101_plan/featured.jpg","permalink":"/en/posts/260101_plan/","title":"A Brief Reflection on 2025 and Plans for 2026"},{"content":"I found an interesting game on Steam and wanted to share it. 😂\nThe Farmer Was Replaced is a game where you have a drone that you control by writing Python code to automate crop farming. lol 😂\nThe game actually explains Python syntax pretty well, so it might not be a bad way for Python beginners to learn while having fun?\nI haven\u0026rsquo;t enabled Korean mode yet so it\u0026rsquo;s showing in English, but it seems to support Korean as well..!\nWhen I first started playing, I was confused because I couldn\u0026rsquo;t even declare variables. But turns out you need to unlock functions, variables, etc. by spending resources earned from farming crops\u0026hellip;\nIt\u0026rsquo;s pretty addictive, and the core content seems to be implementing optimized algorithms to expand your farm, plant crops, and harvest them at the right timing as they grow.\nAs of today, it\u0026rsquo;s 20% off on Steam, so if you\u0026rsquo;re interested (developers?), I\u0026rsquo;d recommend giving it a try\u0026hellip;? 😄\nSteam Link: The Farmer Was Replaced\n","date":"2025-12-15T16:26:41+09:00","image":"/posts/251215_game/featured.png","permalink":"/en/posts/251215_game/","title":"Coding Game Recommendation: The Farmer Was Replaced"},{"content":"What is Giscus? Giscus is an open-source comment system that uses GitHub Discussions as its backend.\nKey Features ✅ Completely Free (leverages GitHub features) ✅ No Server Required (GitHub handles everything) ✅ Full Markdown Support (code blocks, images, tables, etc.) ✅ Reactions (👍, ❤️, 😄, etc.) ✅ GitHub Notifications (get notified when comments are posted) ✅ Dark Mode (auto-syncs with blog theme) ✅ Data Ownership (stored in your repository) Differences from Utterances Feature Giscus Utterances Backend GitHub Discussions GitHub Issues Reactions ✅ ❌ Nested Replies ✅ (nested) ⚠️ (flat) Comment Sorting ✅ ⚠️ Best For Comments Issue tracking Conclusion: Giscus is the superior choice over Utterances.\nPrerequisites Requirements GitHub account Public GitHub repository (your blog repository) Hugo + Blowfish theme Limitations ⚠️ Public repositories only (Private repositories have limited Discussions functionality) ⚠️ GitHub account required (no anonymous comments) Step 1: Enable GitHub Discussions 1.1 Navigate to Repository Settings Go to your blog repository on GitHub\nExample: https://github.com/0AndWild/0AndWild.github.io Click the Settings tab\n1.2 Enable Discussions Scroll down to find the Features section\nCheck the Discussions checkbox ✅\nIt will save automatically\n1.3 Verify Confirm that the Discussions tab appears at the top of your repository\nCode | Issues | Pull requests | Discussions | ← Newly created! Step 2: Install Giscus App 2.1 Install Giscus GitHub App Visit https://github.com/apps/giscus\nClick the Install button\nChoose permission scope:\nAll repositories (all repositories) Only select repositories (specific repositories - recommended) Select your blog repository:\n0AndWild/0AndWild.github.io Click Install\n2.2 Verify Permissions Giscus requests the following permissions:\n✅ Read access to discussions (read discussions) ✅ Write access to discussions (write discussions) ✅ Read access to metadata (read metadata) Step 3: Generate Giscus Configuration 3.1 Visit Giscus Website Go to https://giscus.app\n3.2 Connect Repository Enter in the Repository section:\n0AndWild/0AndWild.github.io You should see a success message below:\n✅ Success! This repository meets all criteria. If you see an error:\nVerify Discussions is enabled Verify Giscus App is installed Verify the repository is Public 3.3 Page ↔️ Discussion Mapping Choose in the Discussion Mapping section:\nRecommended: pathname (path name) Mapping: Select pathname Each blog post\u0026rsquo;s path becomes the Discussion title.\nExample:\nPost: /posts/giscus-guide/ Discussion title: posts/giscus-guide Alternatives: URL: Uses full URL (problematic if domain changes) title: Uses post title (problematic if title changes) og:title: OpenGraph title specific term: Manually specified Recommendation: Use pathname\n3.4 Select Discussion Category Choose from the Discussion Category dropdown:\nRecommended: Announcements Category: Select Announcements Characteristics:\nOnly admins can create new Discussions Anyone can comment Ideal for blog posts Alternative: General Anyone can create Discussions More open Recommendation: Announcements (best for blogs)\n3.5 Feature Selection Enable Reactions ✅ Enable reactions Users can react with 👍, ❤️, 😄, etc.\nEmit Metadata □ Emit metadata (recommended to leave unchecked) Unnecessary feature, better to keep it off\nComment Input Position ⚪ Above comments ⚪ Below comments (recommended) Recommendation: Below comments\nEncourages users to read existing comments first Lazy Loading ✅ Lazy loading Improves page load speed (recommended)\n3.6 Theme Selection Recommended: preferred_color_scheme Theme: preferred_color_scheme Behavior:\nAutomatically switches based on user\u0026rsquo;s system settings Dark mode ↔️ Light mode automatic Alternatives: light: Always light theme dark: Always dark theme transparent_dark: Transparent dark Other GitHub themes Recommendation: preferred_color_scheme (auto-switching)\n3.7 Language Setting Language: en (English) Step 4: Copy Generated Code 4.1 Copy Script Copy the generated code from the Enable giscus section at the bottom of the page:\n\u0026lt;script src=\u0026#34;https://giscus.app/client.js\u0026#34; data-repo=\u0026#34;0AndWild/0AndWild.github.io\u0026#34; data-repo-id=\u0026#34;R_kgDOxxxxxxxx\u0026#34; data-category=\u0026#34;Announcements\u0026#34; data-category-id=\u0026#34;DIC_kwDOxxxxxxxx\u0026#34; data-mapping=\u0026#34;pathname\u0026#34; data-strict=\u0026#34;0\u0026#34; data-reactions-enabled=\u0026#34;1\u0026#34; data-emit-metadata=\u0026#34;0\u0026#34; data-input-position=\u0026#34;bottom\u0026#34; data-theme=\u0026#34;preferred_color_scheme\u0026#34; data-lang=\u0026#34;en\u0026#34; data-loading=\u0026#34;lazy\u0026#34; crossorigin=\u0026#34;anonymous\u0026#34; async\u0026gt; \u0026lt;/script\u0026gt; 4.2 Important Values data-repo-id: Repository unique ID (auto-generated) data-category-id: Category unique ID (auto-generated) These values are unique to your repository, so you must use the code generated from the Giscus website.\nStep 5: Integrate with Blowfish Theme 5.1 Create Directory From the terminal, navigate to your blog\u0026rsquo;s root directory:\nmkdir -p layouts/partials 5.2 Create comments.html File touch layouts/partials/comments.html Or create directly in your IDE/editor:\nlayouts/ └── partials/ └── comments.html ← Create new 5.3 Insert Giscus Code Add the following content to layouts/partials/comments.html:\n\u0026lt;!-- Giscus Comment System --\u0026gt; \u0026lt;script src=\u0026#34;https://giscus.app/client.js\u0026#34; data-repo=\u0026#34;0AndWild/0AndWild.github.io\u0026#34; data-repo-id=\u0026#34;R_kgDOxxxxxxxx\u0026#34; data-category=\u0026#34;Announcements\u0026#34; data-category-id=\u0026#34;DIC_kwDOxxxxxxxx\u0026#34; data-mapping=\u0026#34;pathname\u0026#34; data-strict=\u0026#34;0\u0026#34; data-reactions-enabled=\u0026#34;1\u0026#34; data-emit-metadata=\u0026#34;0\u0026#34; data-input-position=\u0026#34;bottom\u0026#34; data-theme=\u0026#34;preferred_color_scheme\u0026#34; data-lang=\u0026#34;en\u0026#34; data-loading=\u0026#34;lazy\u0026#34; crossorigin=\u0026#34;anonymous\u0026#34; async\u0026gt; \u0026lt;/script\u0026gt; ⚠️ Important: Replace the data-repo-id and data-category-id values with your own values!\n5.4 Configure params.toml Open config/_default/params.toml and add to the [article] section:\n[article] showComments = true # Add or verify this line # ... other settings If the showComments entry already exists, make sure it\u0026rsquo;s set to true.\nStep 6: Local Testing 6.1 Run Hugo Server hugo server -D 6.2 Verify in Browser http://localhost:1313 The Giscus comment widget should appear at the bottom of post pages.\n6.3 Write Test Comment Click Sign in with GitHub button Authorize GitHub OAuth Write a test comment Verify the comment displays 6.4 Check GitHub Discussions GitHub repository → Discussions tab Verify a new Discussion was created in the Announcements category Verify the Discussion title matches the post path Step 7: Deploy 7.1 Commit to Git git add layouts/partials/comments.html git add config/_default/params.toml git commit -m \u0026#34;Add Giscus comments system\u0026#34; 7.2 Push to GitHub git push origin main 7.3 Check GitHub Actions GitHub Actions will automatically build and deploy.\nCheck deployment status:\nGitHub repository → Actions tab 7.4 Verify Deployed Site https://0andwild.github.io Verify the comment widget displays correctly on post pages.\nAdvanced Configuration Dynamic Dark Mode and Language Setting (Recommended) A complete solution to make Giscus automatically adapt to Blowfish theme\u0026rsquo;s dark mode toggle and language switching.\nComplete Dynamic Configuration Full code for layouts/partials/comments.html:\n\u0026lt;!-- Giscus Comments with Dynamic Theme and Language --\u0026gt; {{ $lang := .Site.Language.Lang }} {{ $translationKey := .File.TranslationBaseName }} \u0026lt;script\u0026gt; (function() { // Get current theme (dark/light) function getGiscusTheme() { const isDark = document.documentElement.classList.contains(\u0026#39;dark\u0026#39;); return isDark ? \u0026#39;dark_tritanopia\u0026#39; : \u0026#39;light_tritanopia\u0026#39;; } // Get language from Hugo template const currentLang = \u0026#39;{{ $lang }}\u0026#39;; // Use file directory path for unified comments across languages // Example: \u0026#34;posts/subscription_alert\u0026#34; for both index.ko.md and index.en.md const discussionId = \u0026#39;{{ .File.Dir | replaceRE \u0026#34;^content/\u0026#34; \u0026#34;\u0026#34; | replaceRE \u0026#34;/$\u0026#34; \u0026#34;\u0026#34; }}\u0026#39;; // Wait for DOM to be ready if (document.readyState === \u0026#39;loading\u0026#39;) { document.addEventListener(\u0026#39;DOMContentLoaded\u0026#39;, initGiscus); } else { initGiscus(); } function initGiscus() { // Create and insert Giscus script with dynamic settings const script = document.createElement(\u0026#39;script\u0026#39;); script.src = \u0026#39;https://giscus.app/client.js\u0026#39;; script.setAttribute(\u0026#39;data-repo\u0026#39;, \u0026#39;0AndWild/0AndWild.github.io\u0026#39;); script.setAttribute(\u0026#39;data-repo-id\u0026#39;, \u0026#39;R_kgDOQAqZFA\u0026#39;); script.setAttribute(\u0026#39;data-category\u0026#39;, \u0026#39;General\u0026#39;); script.setAttribute(\u0026#39;data-category-id\u0026#39;, \u0026#39;DIC_kwDOQAqZFM4CwwRg\u0026#39;); script.setAttribute(\u0026#39;data-mapping\u0026#39;, \u0026#39;specific\u0026#39;); script.setAttribute(\u0026#39;data-term\u0026#39;, discussionId); script.setAttribute(\u0026#39;data-strict\u0026#39;, \u0026#39;0\u0026#39;); script.setAttribute(\u0026#39;data-reactions-enabled\u0026#39;, \u0026#39;1\u0026#39;); script.setAttribute(\u0026#39;data-emit-metadata\u0026#39;, \u0026#39;0\u0026#39;); script.setAttribute(\u0026#39;data-input-position\u0026#39;, \u0026#39;bottom\u0026#39;); script.setAttribute(\u0026#39;data-theme\u0026#39;, getGiscusTheme()); script.setAttribute(\u0026#39;data-lang\u0026#39;, currentLang); script.setAttribute(\u0026#39;data-loading\u0026#39;, \u0026#39;lazy\u0026#39;); script.setAttribute(\u0026#39;crossorigin\u0026#39;, \u0026#39;anonymous\u0026#39;); script.async = true; // Find giscus container or create one const container = document.querySelector(\u0026#39;.giscus-container\u0026#39;) || document.currentScript?.parentElement; if (container) { container.appendChild(script); } } // Monitor theme changes and update Giscus function updateGiscusTheme() { const iframe = document.querySelector(\u0026#39;iframe.giscus-frame\u0026#39;); if (!iframe) return; const theme = getGiscusTheme(); try { iframe.contentWindow.postMessage( { giscus: { setConfig: { theme: theme } } }, \u0026#39;https://giscus.app\u0026#39; ); } catch (error) { console.log(\u0026#39;Giscus theme update delayed, will retry...\u0026#39;); } } // Watch for theme changes using MutationObserver const observer = new MutationObserver((mutations) =\u0026gt; { mutations.forEach((mutation) =\u0026gt; { if (mutation.attributeName === \u0026#39;class\u0026#39;) { // Delay update to ensure iframe is ready setTimeout(updateGiscusTheme, 100); } }); }); // Start observing after a short delay setTimeout(() =\u0026gt; { observer.observe(document.documentElement, { attributes: true, attributeFilter: [\u0026#39;class\u0026#39;] }); }, 500); // Update theme when Giscus iframe loads window.addEventListener(\u0026#39;message\u0026#39;, (event) =\u0026gt; { if (event.origin !== \u0026#39;https://giscus.app\u0026#39;) return; if (event.data.giscus) { // Giscus is ready, update theme setTimeout(updateGiscusTheme, 200); } }); })(); \u0026lt;/script\u0026gt; \u0026lt;style\u0026gt; /* Ensure Giscus iframe has proper height and displays all content */ .giscus-container { min-height: 300px; } .giscus-container iframe.giscus-frame { width: 100%; border: none; min-height: 300px; } /* Make sure comment actions are visible */ .giscus { overflow: visible !important; } \u0026lt;/style\u0026gt; \u0026lt;div class=\u0026#34;giscus-container\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; How It Works 1. Dynamic Language Setting {{ $lang := .Site.Language.Lang }} const currentLang = \u0026#39;{{ $lang }}\u0026#39;; Gets current page language from Hugo template Korean page: ko, English page: en Sets Giscus to the corresponding language Result:\nKorean page → Giscus UI displays in Korean English page → Giscus UI displays in English Language switch triggers page reload with automatic update 2. Dynamic Dark Mode Setting function getGiscusTheme() { const isDark = document.documentElement.classList.contains(\u0026#39;dark\u0026#39;); return isDark ? \u0026#39;dark_tritanopia\u0026#39; : \u0026#39;light_tritanopia\u0026#39;; } Blowfish theme adds \u0026lt;html class=\u0026quot;dark\u0026quot;\u0026gt; in dark mode Detects this to determine theme Uses dark_tritanopia / light_tritanopia themes (colorblind-friendly) Result:\nPage load: Loads Giscus with current theme state Dark mode toggle click: Real-time Giscus theme change 3. Unified Comments Across Languages const discussionId = \u0026#39;{{ .File.Dir | replaceRE \u0026#34;^content/\u0026#34; \u0026#34;\u0026#34; | replaceRE \u0026#34;/$\u0026#34; \u0026#34;\u0026#34; }}\u0026#39;; Uses file directory path as Discussion ID content/posts/subscription_alert/index.ko.md → posts/subscription_alert content/posts/subscription_alert/index.en.md → posts/subscription_alert Same ID means Korean/English versions share the same comments Result:\nComments written on Korean post Also display on English post Separate Discussions created per post 4. Real-time Theme Change Detection const observer = new MutationObserver((mutations) =\u0026gt; { mutations.forEach((mutation) =\u0026gt; { if (mutation.attributeName === \u0026#39;class\u0026#39;) { setTimeout(updateGiscusTheme, 100); } }); }); MutationObserver detects HTML class changes Immediately detects dark mode toggle clicks Sends theme change command to Giscus iframe via postMessage Testing Method # 1. Run local server hugo server -D # 2. Verify in browser http://localhost:1313/posts/subscription_alert/ Test Items:\n✅ Page load displays Giscus with current theme (light/dark) ✅ Dark mode toggle click immediately changes Giscus theme ✅ Language switch (ko → en) changes Giscus language ✅ Korean/English pages display same comments Changing Theme Options To use different themes, modify the getGiscusTheme() function:\n// Basic theme function getGiscusTheme() { const isDark = document.documentElement.classList.contains(\u0026#39;dark\u0026#39;); return isDark ? \u0026#39;dark\u0026#39; : \u0026#39;light\u0026#39;; } // High contrast theme function getGiscusTheme() { const isDark = document.documentElement.classList.contains(\u0026#39;dark\u0026#39;); return isDark ? \u0026#39;dark_high_contrast\u0026#39; : \u0026#39;light_high_contrast\u0026#39;; } // GitHub style theme function getGiscusTheme() { const isDark = document.documentElement.classList.contains(\u0026#39;dark\u0026#39;); return isDark ? \u0026#39;dark_dimmed\u0026#39; : \u0026#39;light\u0026#39;; } Available themes:\nlight / dark light_high_contrast / dark_high_contrast light_tritanopia / dark_tritanopia (colorblind-friendly) dark_dimmed transparent_dark preferred_color_scheme (follows system settings) Static Theme Configuration (Simple Method) If dynamic changes aren\u0026rsquo;t needed, you can configure statically:\n\u0026lt;script src=\u0026#34;https://giscus.app/client.js\u0026#34; data-repo=\u0026#34;0AndWild/0AndWild.github.io\u0026#34; data-repo-id=\u0026#34;R_kgDOxxxxxxxx\u0026#34; data-category=\u0026#34;General\u0026#34; data-category-id=\u0026#34;DIC_kwDOxxxxxxxx\u0026#34; data-mapping=\u0026#34;pathname\u0026#34; data-theme=\u0026#34;preferred_color_scheme\u0026#34; data-lang=\u0026#34;en\u0026#34; crossorigin=\u0026#34;anonymous\u0026#34; async\u0026gt; \u0026lt;/script\u0026gt; Pros: Simple Cons: No real-time theme changes, comments separated by language\nHide Comments on Specific Posts To hide comments on specific posts only, add to that post\u0026rsquo;s front matter:\n--- title: \u0026#34;Post Without Comments\u0026#34; showComments: false # Hide comments on this post only --- Separate Comments by Category To use different Discussion categories for posts in different categories:\n\u0026lt;!-- Conditional category configuration --\u0026gt; \u0026lt;script\u0026gt; const category = {{ if in .Params.categories \u0026#34;Tutorial\u0026#34; }} \u0026#34;DIC_kwDOxxxxTutorial\u0026#34; {{ else }} \u0026#34;DIC_kwDOxxxxGeneral\u0026#34; {{ end }}; \u0026lt;/script\u0026gt; \u0026lt;script src=\u0026#34;https://giscus.app/client.js\u0026#34; ... data-category-id=\u0026#34;{{ category }}\u0026#34; ...\u0026gt; \u0026lt;/script\u0026gt; Troubleshooting Comment Widget Not Displaying Cause 1: Discussions Not Enabled Solution: GitHub repository → Settings → Check Discussions Cause 2: Giscus App Not Installed Solution: Install at https://github.com/apps/giscus Cause 3: Repository ID Error Solution: Regenerate code at giscus.app Cause 4: showComments Setting Missing # config/_default/params.toml [article] showComments = true # Verify Only Login Button Shows, Can\u0026rsquo;t Comment Cause: GitHub OAuth Authorization Needed 1. Click \u0026#34;Sign in with GitHub\u0026#34; 2. Authorize OAuth permissions 3. Redirect to repository 4. Can write comments Comments Not Saving Cause: Repository Permission Issue Check: 1. Is the repository Public? 2. Is the repository included in Giscus App permissions? 3. Does the Discussion category exist? Dark Mode Not Syncing Solution: Add JavaScript Sync Code Refer to \u0026ldquo;Advanced Configuration \u0026gt; Automatic Dark Mode Switching\u0026rdquo; above\nManaging Giscus Comment Management Manage via GitHub Discussions 1. GitHub repository → Discussions tab 2. Click the relevant Discussion 3. Management actions: - Edit comment (own comments only) - Delete comment (admin) - Block user (admin) - Lock Discussion (admin) Handling Spam Comments 1. Find spam comment in GitHub Discussions 2. ... menu next to comment → \u0026#34;Delete\u0026#34; 3. Block user: Profile → Block user Notification Settings Receive Comment Notifications via GitHub 1. GitHub → Settings → Notifications 2. Add repository to Watching 3. Configure email notifications Receive Notifications for Specific Discussions Only 1. Discussions tab → Relevant Discussion 2. \u0026#34;Subscribe\u0026#34; button on right 3. Select \u0026#34;Notify me\u0026#34; Statistics and Analytics View Comment Statistics In GitHub Discussions:\n1. Discussions tab 2. Check number of Discussions by category 3. Check comment count for each Discussion Utilize GitHub Insights GitHub repository → Insights → Community → Check Discussions activity Cost and Limitations Cost Completely Free\nOnly need a GitHub account Unlimited comments within repository size limits Limitations GitHub API Rate Limit 60 requests/hour (unauthenticated) 5,000 requests/hour (authenticated) Giscus is optimized with caching, so no issues Repository Size GitHub Free: 1GB per repository Text comments alone won\u0026rsquo;t reach the limit Discussions Limit None (unlimited) Alternative Comparisons Giscus vs Utterances Item Giscus Utterances Backend Discussions Issues Reactions ✅ ❌ Nested Replies Nested support Flat Recommendation ⭐⭐⭐⭐⭐ ⭐⭐⭐ Conclusion: Giscus is recommended\nGiscus vs Disqus Item Giscus Disqus Cost Free Free (with ads) Ads ❌ ✅ Anonymous Comments ❌ ✅ (Guest) Markdown ✅ ⚠️ Data Ownership ✅ ❌ Recommendation Developer blogs General blogs Migration Guide Utterances → Giscus 1. Convert GitHub Issues to Discussions - Manual work required (no automation) - Or leave Issues as-is and start fresh with Giscus 2. Replace comments.html file - Delete Utterances code - Add Giscus code 3. Deploy Disqus → Giscus 1. Export Disqus data (XML) 2. Manual migration to GitHub Discussions - No automation tools available - Need to write custom script - Or starting fresh recommended Additional Resources Official Documentation Giscus Official Site Giscus GitHub Community Giscus Discussions Blowfish Documentation Checklist Installation completion checklist:\nGitHub Discussions enabled Giscus App installed Created layouts/partials/comments.html Inserted Giscus code (with your own IDs) Set showComments = true in params.toml Local testing complete Pushed to GitHub Verified on deployed site Wrote test comment Verified creation in GitHub Discussions Conclusion Giscus is the most suitable comment system for Hugo/GitHub Pages blogs:\nSummary of Advantages ✅ Completely free ✅ Simple setup (10 minutes) ✅ No server required ✅ Full Markdown support ✅ GitHub integration ✅ Data ownership\nDisadvantages ❌ GitHub account required (no anonymous comments) ❌ Best for technical blogs (barrier for general users)\nRecommended For ✅ Developer blogs ✅ Technical documentation ✅ Open source projects ","date":"2025-10-17T12:00:00+09:00","image":"/posts/251017_comments_giscus/featured.png","permalink":"/en/posts/251017_comments_giscus/","title":"Adding Comments to Hugo Blog with Giscus"},{"content":"Overview This guide provides a comprehensive comparison of all methods to add comment functionality to blogs built with static site generators (Hugo). We present solutions for various requirements including anonymous comments, GitHub login, and social logins.\nComment System Classification By Authentication Method Authentication Systems GitHub Only Giscus, Utterances Anonymous Supported Remark42, Commento, Comentario, HashOver Anonymous + Social Login Remark42, Commento, Disqus Social Login Only Disqus, Hyvor Talk By Hosting Method Hosting Systems SaaS (No Management) Giscus, Utterances, Disqus, Hyvor Talk Self-Hosted Remark42, Commento, Comentario, HashOver Hybrid Cusdis (Free Vercel deployment) 1. Giscus (Highly Recommended - For GitHub Users) Concept Comment system using GitHub Discussions as backend\nHow It Works 1. User visits blog ↓ 2. Giscus widget loads ↓ 3. Login with GitHub OAuth ↓ 4. Write comment ↓ 5. Auto-saved to GitHub Discussions ↓ 6. Displayed on blog in real-time Advantages ✅ Completely free (leverages GitHub features) ✅ No server required (GitHub handles backend) ✅ Data ownership (stored in your repository) ✅ Markdown support (code blocks, images, etc.) ✅ Reactions support (👍, ❤️, etc.) ✅ Notifications (comment alerts via GitHub notifications) ✅ Dark mode (syncs with blog theme) ✅ Spam prevention (requires GitHub account) ✅ Easy management (manage in GitHub Discussions) ✅ Searchable (search comments via GitHub search) Disadvantages ❌ No anonymous comments (GitHub account required) ❌ Best for tech blogs (general users may not have GitHub accounts) ❌ GitHub dependency (comments unavailable during GitHub outages) Implementation Difficulty ⭐⭐ (2/5)\nSetup Method Step 1: Enable GitHub Discussions 1. GitHub Repository → Settings 2. Features section → Check Discussions Step 2: Configure Giscus Visit giscus.app Enter repository: username/repository Select settings: Page ↔️ Discussion mapping: pathname (recommended) Discussion category: Announcements or General Features: Reactions, comments above Theme: Match your blog theme Step 3: Add to Blowfish \u0026lt;!-- layouts/partials/comments.html --\u0026gt; \u0026lt;script src=\u0026#34;https://giscus.app/client.js\u0026#34; data-repo=\u0026#34;0AndWild/0AndWild.github.io\u0026#34; data-repo-id=\u0026#34;YOUR_REPO_ID\u0026#34; data-category=\u0026#34;Announcements\u0026#34; data-category-id=\u0026#34;YOUR_CATEGORY_ID\u0026#34; data-mapping=\u0026#34;pathname\u0026#34; data-strict=\u0026#34;0\u0026#34; data-reactions-enabled=\u0026#34;1\u0026#34; data-emit-metadata=\u0026#34;0\u0026#34; data-input-position=\u0026#34;bottom\u0026#34; data-theme=\u0026#34;preferred_color_scheme\u0026#34; data-lang=\u0026#34;en\u0026#34; crossorigin=\u0026#34;anonymous\u0026#34; async\u0026gt; \u0026lt;/script\u0026gt; Step 4: Configure params.toml [article] showComments = true Theme Synchronization (Dark Mode) \u0026lt;script\u0026gt; // Change Giscus theme when blog theme changes const giscusTheme = document.querySelector(\u0026#39;iframe.giscus-frame\u0026#39;); if (giscusTheme) { const theme = document.documentElement.getAttribute(\u0026#39;data-theme\u0026#39;); giscusTheme.contentWindow.postMessage({ giscus: { setConfig: { theme: theme === \u0026#39;dark\u0026#39; ? \u0026#39;dark\u0026#39; : \u0026#39;light\u0026#39; } } }, \u0026#39;https://giscus.app\u0026#39;); } \u0026lt;/script\u0026gt; Cost Completely free\nRecommended For ✅ Developer blogs ✅ Technical documentation ✅ Open source project blogs 2. Utterances Concept Comment system using GitHub Issues as backend (predecessor of Giscus)\nHow It Works 1. GitHub OAuth login ↓ 2. Write comment ↓ 3. Save to GitHub Issues (each post = 1 Issue) ↓ 4. Display on blog Advantages ✅ Completely free ✅ Lightweight (TypeScript) ✅ Simple setup ✅ Markdown support Disadvantages ❌ Uses Issues (less suitable than Discussions) ❌ Fewer features than Giscus ❌ No anonymous comments Giscus vs Utterances Feature Giscus Utterances Backend Discussions Issues Reactions ✅ ❌ Nested Replies ✅ (nested) ⚠️ (flat) Suitability Comment-specific Issue tracking Conclusion: Giscus is a superior alternative to Utterances\nImplementation Difficulty ⭐⭐ (2/5)\nSetup Method \u0026lt;!-- layouts/partials/comments.html --\u0026gt; \u0026lt;script src=\u0026#34;https://utteranc.es/client.js\u0026#34; repo=\u0026#34;username/repository\u0026#34; issue-term=\u0026#34;pathname\u0026#34; theme=\u0026#34;github-light\u0026#34; crossorigin=\u0026#34;anonymous\u0026#34; async\u0026gt; \u0026lt;/script\u0026gt; Recommended For Unless there\u0026rsquo;s a specific reason, use Giscus instead 3. Remark42 (Highly Recommended - Anonymous + Social Login) Concept Open-source self-hosted comment system supporting anonymous and various social logins\nHow It Works 1. Deploy Remark42 server (Docker) ↓ 2. Insert Remark42 script on blog ↓ 3. User chooses: - Write anonymous comment - Login with GitHub/Google/Twitter and write ↓ 4. Save to Remark42 DB ↓ 5. Display on blog Advantages ✅ Anonymous comments supported (can be toggled on/off) ✅ Various social logins (GitHub, Google, Facebook, Twitter, Email) ✅ Completely free (open source) ✅ No ads ✅ Data ownership (your server) ✅ Markdown support ✅ Comment edit/delete ✅ Admin mode (approve/block/delete comments) ✅ Notifications (Email/Telegram) ✅ Import/Export (migrate from other systems) ✅ Voting (upvote/downvote) ✅ Spam filter Disadvantages ❌ Self-hosting required (Docker server) ❌ Maintenance responsibility ❌ Hosting costs ($5/month~, free tier possible) Implementation Difficulty ⭐⭐⭐⭐ (4/5)\nHosting Options Option 1: Railway (Recommended) 1. Sign up for Railway.app 2. \u0026#34;New Project\u0026#34; → \u0026#34;Deploy from GitHub\u0026#34; 3. Select Remark42 Docker image 4. Configure environment variables: - REMARK_URL=https://your-remark42.railway.app - SECRET=your-random-secret - AUTH_ANON=true # Allow anonymous comments - AUTH_GITHUB_CID=your_client_id - AUTH_GITHUB_CSEC=your_client_secret Railway Free Tier:\n$5 credit per month Sufficient for small blogs Option 2: Fly.io # fly.toml app = \u0026#34;my-remark42\u0026#34; [build] image = \u0026#34;umputun/remark42:latest\u0026#34; [env] REMARK_URL = \u0026#34;https://my-remark42.fly.dev\u0026#34; AUTH_ANON = \u0026#34;true\u0026#34; AUTH_GITHUB_CID = \u0026#34;xxx\u0026#34; AUTH_GITHUB_CSEC = \u0026#34;xxx\u0026#34; fly launch fly deploy Fly.io Free Tier:\n3 apps Sufficient for small blogs Option 3: Docker Compose (VPS) # docker-compose.yml version: \u0026#39;3.8\u0026#39; services: remark42: image: umputun/remark42:latest restart: always environment: - REMARK_URL=https://remark.your-blog.com - SECRET=your-secret-key-change-this - AUTH_ANON=true # Allow anonymous - AUTH_GITHUB_CID=xxx # GitHub login - AUTH_GITHUB_CSEC=xxx - AUTH_GOOGLE_CID=xxx # Google login - AUTH_GOOGLE_CSEC=xxx - ADMIN_SHARED_ID=github_username # Admin volumes: - ./data:/srv/var ports: - \u0026#34;8080:8080\u0026#34; docker-compose up -d Blog Embed Code \u0026lt;!-- layouts/partials/comments.html --\u0026gt; \u0026lt;div id=\u0026#34;remark42\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; \u0026lt;script\u0026gt; var remark_config = { host: \u0026#39;https://your-remark42.railway.app\u0026#39;, site_id: \u0026#39;0andwild-blog\u0026#39;, components: [\u0026#39;embed\u0026#39;], theme: \u0026#39;light\u0026#39;, locale: \u0026#39;en\u0026#39;, max_shown_comments: 10, simple_view: false, no_footer: false }; (function(c) { for(var i = 0; i \u0026lt; c.length; i++){ var d = document, s = d.createElement(\u0026#39;script\u0026#39;); s.src = remark_config.host + \u0026#39;/web/\u0026#39; +c[i] +\u0026#39;.js\u0026#39;; s.defer = true; (d.head || d.body).appendChild(s); } })(remark_config.components || [\u0026#39;embed\u0026#39;]); \u0026lt;/script\u0026gt; Anonymous + GitHub Simultaneous Configuration # Environment variables AUTH_ANON=true # Allow anonymous AUTH_GITHUB_CID=xxx # GitHub OAuth App ID AUTH_GITHUB_CSEC=xxx # GitHub OAuth App Secret ANON_VOTE=false # Disable voting for anonymous (spam prevention) Users can choose:\n\u0026ldquo;Comment anonymously\u0026rdquo; \u0026ldquo;Login with GitHub\u0026rdquo; Admin Features # Designate admin ADMIN_SHARED_ID=github_yourusername # Or by email ADMIN_SHARED_EMAIL=you@example.com Admin capabilities:\nDelete comments Block users Pin comments Read-only mode Cost Railway: Free or $5/month Fly.io: Free tier available VPS (DigitalOcean, etc.): $5/month~ Recommended For ✅ Want both anonymous and social login ✅ Users comfortable with Docker ✅ Want complete data control 4. Commento / Comentario Concept Privacy-focused lightweight comment system\nCommento vs Comentario Item Commento Comentario Status Development stopped Actively developed (Commento fork) License MIT MIT Language Go Go Recommend ❌ ✅ Conclusion: Comentario recommended\nComentario Advantages ✅ Anonymous comments supported ✅ Social logins (GitHub, Google, GitLab, SSO) ✅ Lightweight (Go-based) ✅ Privacy-focused ✅ Markdown support ✅ Voting feature Disadvantages ❌ Self-hosting required ❌ Fewer features than Remark42 Implementation Difficulty ⭐⭐⭐⭐ (4/5)\nDocker Deployment version: \u0026#39;3.8\u0026#39; services: comentario: image: registry.gitlab.com/comentario/comentario ports: - \u0026#34;8080:8080\u0026#34; environment: - COMENTARIO_ORIGIN=https://comments.your-blog.com - COMENTARIO_BIND=0.0.0.0:8080 - COMENTARIO_POSTGRES=postgres://user:pass@db/comentario depends_on: - db db: image: postgres:15 environment: - POSTGRES_DB=comentario - POSTGRES_USER=comentario - POSTGRES_PASSWORD=change-this volumes: - postgres_data:/var/lib/postgresql/data volumes: postgres_data: Blog Embed \u0026lt;script defer src=\u0026#34;https://comments.your-blog.com/js/commento.js\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;div id=\u0026#34;commento\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; Recommended For Alternative to Remark42 Want simpler system 5. Disqus (Traditional SaaS) Concept Oldest and most widely used cloud comment system\nHow It Works 1. Create Disqus account and register site ↓ 2. Insert Disqus script on blog ↓ 3. User chooses: - Guest (anonymous - requires email) - Disqus account - Facebook/Twitter/Google login ↓ 4. Save to Disqus server ↓ 5. Display on blog Advantages ✅ Extremely simple setup (5 minutes) ✅ No server required (SaaS) ✅ Guest mode (comment with just email) ✅ Social logins (Facebook, Twitter, Google) ✅ Powerful admin tools ✅ Spam filter (Akismet integration) ✅ Mobile apps (iOS/Android) ✅ Analytics/Statistics Disadvantages ❌ Shows ads (free plan) ❌ Heavy (script size) ❌ Privacy concerns (data tracking) ❌ No data ownership (Disqus servers) ❌ No GitHub login ❌ Ad removal cost ($11.99/month~) Implementation Difficulty ⭐ (1/5) - Easiest\nSetup Method Step 1: Register Disqus Site 1. Sign up at disqus.com 2. Select \u0026#34;I want to install Disqus on my site\u0026#34; 3. Enter Website Name (e.g., andwild-blog) 4. Select Category 5. Select Plan (Basic - Free) Step 2: Configure Blowfish # config/_default/config.toml [services.disqus] shortname = \u0026#34;andwild-blog\u0026#34; # Name created in Step 1 # config/_default/params.toml [article] showComments = true Hugo has built-in Disqus support, so comments display automatically!\nStep 3: Allow Guest Comments Disqus Dashboard → Settings → Community → Guest Commenting: Allow guests to comment (check) Ad Removal Methods Method 1: Paid Plan ($11.99/month~) Plus Plan: No ads Pro Plan: No ads + advanced features Method 2: Hide with CSS (Not recommended - may violate terms) /* Not recommended: May violate Disqus terms */ #disqus_thread iframe[src*=\u0026#34;ads\u0026#34;] { display: none !important; } Cost Free: With ads Plus: $11.99/month (no ads) Pro: $89/month (advanced features) Recommended For ✅ Want to add comments quickly ✅ Non-technical bloggers ✅ Don\u0026rsquo;t mind ads ❌ Not recommended for privacy-conscious users 6. Cusdis (Free Vercel Deployment) Concept Lightweight open-source comment system, deployable to Vercel for free\nHow It Works 1. Deploy Cusdis to Vercel (1-Click) ↓ 2. Connect PostgreSQL (Vercel free) ↓ 3. Add site in dashboard ↓ 4. Insert script on blog ↓ 5. Users comment with email + name Advantages ✅ Completely free (Vercel free tier) ✅ Anonymous comments (just email + name) ✅ Lightweight (50KB) ✅ Simple setup (Vercel 1-Click deploy) ✅ Privacy-focused ✅ Open source Disadvantages ❌ No Markdown support ❌ No social login ❌ Simple features Implementation Difficulty ⭐⭐⭐ (3/5)\nSetup Method Step 1: Deploy to Vercel 1. Visit https://cusdis.com/ 2. Click \u0026#34;Deploy with Vercel\u0026#34; 3. Connect GitHub 4. Add PostgreSQL (Vercel Storage) 5. Deployment complete Step 2: Add Site 1. Access deployed Cusdis dashboard 2. Click \u0026#34;Add Website\u0026#34; 3. Enter Domain: 0andwild.github.io 4. Copy App ID Step 3: Blog Embed \u0026lt;!-- layouts/partials/comments.html --\u0026gt; \u0026lt;div id=\u0026#34;cusdis_thread\u0026#34; data-host=\u0026#34;https://your-cusdis.vercel.app\u0026#34; data-app-id=\u0026#34;YOUR_APP_ID\u0026#34; data-page-id=\u0026#34;{{ .File.UniqueID }}\u0026#34; data-page-url=\u0026#34;{{ .Permalink }}\u0026#34; data-page-title=\u0026#34;{{ .Title }}\u0026#34;\u0026gt; \u0026lt;/div\u0026gt; \u0026lt;script async defer src=\u0026#34;https://your-cusdis.vercel.app/js/cusdis.es.js\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; Cost Completely free (Vercel free tier)\nRecommended For ✅ Need only simple anonymous comments ✅ Want completely free solution ✅ Have Vercel experience 7. HashOver Concept PHP-based fully anonymous comment system\nAdvantages ✅ Fully anonymous (no information needed) ✅ PHP + flat file (no DB required) ✅ Open source Disadvantages ❌ Requires PHP (unsuitable for static sites) ❌ No GitHub login ❌ Old project Implementation Difficulty ⭐⭐⭐⭐ (4/5)\nRecommended For ❌ Not recommended for static blogs Consider only if you have a PHP server 8. Hyvor Talk (Premium SaaS) Concept Ad-free premium comment system\nAdvantages ✅ No ads ✅ Anonymous comments supported ✅ Social logins ✅ Powerful spam filter Disadvantages ❌ Paid ($5/month~) ❌ No GitHub login Cost Starter: $5/month (1 site) Pro: $15/month (3 sites) Recommended For Paid alternative to Disqus Want ad-free SaaS Comparison Tables By Authentication Method System Anonymous GitHub Google Other Social Difficulty Cost Giscus ❌ ✅ ❌ ❌ ⭐⭐ Free Utterances ❌ ✅ ❌ ❌ ⭐⭐ Free Remark42 ✅ ✅ ✅ ✅ ⭐⭐⭐⭐ $5/mo Comentario ✅ ✅ ✅ ✅ ⭐⭐⭐⭐ $5/mo Disqus ⚠️ ❌ ✅ ✅ ⭐ Free (ads) Cusdis ✅ ❌ ❌ ❌ ⭐⭐⭐ Free Hyvor Talk ✅ ❌ ✅ ✅ ⭐ $5/mo By Features System Markdown Reactions Voting Notifications Admin Spam Filter Giscus ✅ ✅ ❌ ✅ ⚠️ ✅ Remark42 ✅ ❌ ✅ ✅ ✅ ✅ Disqus ⚠️ ❌ ✅ ✅ ✅ ✅ Cusdis ❌ ❌ ❌ ⚠️ ✅ ⚠️ By Hosting System Hosting Data Location Dependency Giscus GitHub GitHub Discussions GitHub Remark42 Self Your server Docker Disqus Disqus Disqus servers Disqus Cusdis Vercel Vercel DB Vercel Selection Guide Scenario-Based Recommendations 1. \u0026ldquo;Developer blog, targeting GitHub users\u0026rdquo; → Giscus ⭐⭐⭐⭐⭐\nFree, simple, Markdown support GitHub integration makes notifications convenient 2. \u0026ldquo;General blog, anonymous comments essential\u0026rdquo; → Cusdis (simple) or Remark42 (advanced)\nCusdis: 5-minute setup, completely free Remark42: More features, includes social login 3. \u0026ldquo;Both anonymous + GitHub login\u0026rdquo; → Remark42 ⭐⭐⭐⭐⭐\nOnly option supporting both Powerful admin features 4. \u0026ldquo;No technical skills, quick setup\u0026rdquo; → Disqus\n5-minute setup Accept ads 5. \u0026ldquo;Completely free + don\u0026rsquo;t want server management\u0026rdquo; → Giscus (GitHub) or Cusdis (anonymous)\n6. \u0026ldquo;Privacy is top priority\u0026rdquo; → Remark42 or Comentario (self-hosted)\nComplete data control Practical Implementation: Blowfish + Giscus Complete Setup Process 1. Enable GitHub Discussions GitHub Repository → Settings → Features → Check Discussions 2. Install Giscus App Visit https://github.com/apps/giscus → Install → Select repository 3. Generate Giscus Configuration At giscus.app:\nRepository: 0AndWild/0AndWild.github.io Mapping: pathname Category: Announcements Theme: preferred_color_scheme Language: en Copy generated code\n4. Create File # Create directory (if not exists) mkdir -p layouts/partials # Create file touch layouts/partials/comments.html 5. Insert Code \u0026lt;!-- layouts/partials/comments.html --\u0026gt; \u0026lt;script src=\u0026#34;https://giscus.app/client.js\u0026#34; data-repo=\u0026#34;0AndWild/0AndWild.github.io\u0026#34; data-repo-id=\u0026#34;R_xxxxxxxxxxxxx\u0026#34; data-category=\u0026#34;Announcements\u0026#34; data-category-id=\u0026#34;DIC_xxxxxxxxxxxxx\u0026#34; data-mapping=\u0026#34;pathname\u0026#34; data-strict=\u0026#34;0\u0026#34; data-reactions-enabled=\u0026#34;1\u0026#34; data-emit-metadata=\u0026#34;0\u0026#34; data-input-position=\u0026#34;bottom\u0026#34; data-theme=\u0026#34;preferred_color_scheme\u0026#34; data-lang=\u0026#34;en\u0026#34; crossorigin=\u0026#34;anonymous\u0026#34; async\u0026gt; \u0026lt;/script\u0026gt; 6. Modify params.toml [article] showComments = true 7. Local Testing hugo server -D # Check at http://localhost:1313 8. Deploy git add . git commit -m \u0026#34;Add Giscus comments\u0026#34; git push Practical Implementation: Blowfish + Remark42 (Railway) Complete Setup Process 1. Create GitHub OAuth App GitHub → Settings → Developer settings → OAuth Apps → New OAuth App Application name: AndWild Blog Comments Homepage URL: https://0andwild.github.io Authorization callback URL: https://your-remark42.railway.app/auth/github/callback After creation: Copy Client ID Generate and copy Client Secret 2. Deploy to Railway 1. Sign up for railway.app 2. \u0026#34;New Project\u0026#34; → \u0026#34;Deploy Docker Image\u0026#34; 3. Image: umputun/remark42:latest 4. Add environment variables: REMARK_URL=https://your-project.railway.app SECRET=randomly-generated-secret-key-change-this SITE=0andwild-blog AUTH_ANON=true AUTH_GITHUB_CID=your_github_client_id AUTH_GITHUB_CSEC=your_github_client_secret ADMIN_SHARED_ID=github_yourusername 3. Verify Deployment Railway automatically generates URL: https://your-project.railway.app Access in browser to verify Remark42 UI 4. Configure Blowfish mkdir -p layouts/partials touch layouts/partials/comments.html \u0026lt;!-- layouts/partials/comments.html --\u0026gt; \u0026lt;div id=\u0026#34;remark42\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; \u0026lt;script\u0026gt; var remark_config = { host: \u0026#39;https://your-project.railway.app\u0026#39;, site_id: \u0026#39;0andwild-blog\u0026#39;, components: [\u0026#39;embed\u0026#39;], theme: \u0026#39;light\u0026#39;, locale: \u0026#39;en\u0026#39; }; (function(c) { for(var i = 0; i \u0026lt; c.length; i++){ var d = document, s = d.createElement(\u0026#39;script\u0026#39;); s.src = remark_config.host + \u0026#39;/web/\u0026#39; +c[i] +\u0026#39;.js\u0026#39;; s.defer = true; (d.head || d.body).appendChild(s); } })(remark_config.components || [\u0026#39;embed\u0026#39;]); \u0026lt;/script\u0026gt; 5. params.toml [article] showComments = true 6. Test and Deploy hugo server -D # After verification git add . git commit -m \u0026#34;Add Remark42 comments\u0026#34; git push Migration Guide Disqus → Giscus 1. Export data from Disqus (XML) 2. Manual migration to GitHub Discussions (No automation script, manual work required) Disqus → Remark42 1. Disqus XML Export 2. Remark42 Admin → Import → Select Disqus 3. Upload XML file Conclusion Final Recommendations Situation Recommended System Reason Developer blog Giscus Free, GitHub integration, Markdown General blog (anonymous needed) Cusdis Free, simple, anonymous Both anonymous + social Remark42 Flexible, all features Quick setup Disqus 5-minute completion (accept ads) Complete control Remark42 Self-hosted, customizable Personal Recommendation (0AndWild Blog) Giscus recommended\nPerfect fit for GitHub Pages blog Tech blog\u0026rsquo;s main audience is GitHub users Free, simple, no maintenance Alternative: Remark42 (when anonymous comments desired)\nQuick Start Start with Giscus (10 minutes) Collect user feedback Consider switching to Remark42 if many requests for anonymous comments Comment systems can be changed later, so strongly recommend starting with Giscus!\n","date":"2025-10-17T11:00:00+09:00","image":"/posts/251017_comments_guide/featured.png","permalink":"/en/posts/251017_comments_guide/","title":"Complete Guide to Comment Systems for Static Blogs"},{"content":"Overview This guide analyzes methods to add subscription and email notification features to blogs built with static site generators (Hugo). We\u0026rsquo;ll cover everything from basic subscriptions to keyword-based selective notifications.\n1. RSS Feed + Email Services Concept Leverage services that convert Hugo\u0026rsquo;s built-in RSS Feed into email notifications.\nMethod A: Blogtrottr How It Works 1. Hugo automatically generates RSS Feed (index.xml) ↓ 2. Users register RSS URL on Blogtrottr ↓ 3. Blogtrottr periodically checks RSS ↓ 4. Sends email when new posts detected Advantages ✅ No developer work (just provide link) ✅ Completely free ✅ Works immediately ✅ No server required Disadvantages ❌ No subscriber management ❌ No email design customization ❌ No analytics ❌ No keyword filtering ❌ Users must register on external site Implementation Difficulty ⭐ (1/5) - Easiest\nUsage Example Add link to blog: [Subscribe via Email](https://blogtrottr.com) (Enter https://0andwild.github.io/index.xml on the site) Method B: FeedBurner (Google) How It Works 1. Register RSS Feed with FeedBurner ↓ 2. FeedBurner proxies/manages RSS ↓ 3. Embed subscription form on blog ↓ 4. Users subscribe directly from blog ↓ 5. Auto-sends email when new posts published Advantages ✅ Basic analytics provided ✅ Subscription form provided ✅ Free ✅ RSS management features Disadvantages ❌ Google may discontinue support (updates stopped) ❌ No keyword filtering ❌ Limited customization ❌ Outdated UI Implementation Difficulty ⭐⭐ (2/5)\n2. Mailchimp + RSS Campaign (Recommended) Concept Leverage professional email marketing platform to automatically convert RSS Feed to emails\nHow It Works 1. Create RSS Campaign in Mailchimp ↓ 2. Register RSS URL and set check frequency (daily/weekly/monthly) ↓ 3. Embed Mailchimp subscription form on blog ↓ 4. Users enter email to subscribe ↓ 5. Auto-generates email template when new post detected ↓ 6. Sends to all subscribers Advantages ✅ Free tier: Up to 2,000 subscribers ✅ Professional email design (drag-and-drop editor) ✅ Subscriber management (add/delete/segment) ✅ Detailed analytics (open rate, click rate, unsubscribe rate) ✅ Auto-generated subscription forms (embed code provided) ✅ Automation (only sends on new posts) ✅ Mobile optimized ✅ Spam filter avoidance (professional sending servers) Disadvantages ❌ No keyword filtering by default (tag-based segmentation on Pro plan) ❌ Mailchimp logo shown on free tier ❌ Paid after 2,000 subscribers ($13/month+) Implementation Difficulty ⭐⭐ (2/5)\nSetup Steps 1. Create Mailchimp account 2. Create Audience 3. Campaign → Create → Email → RSS Campaign 4. Enter RSS URL: https://your-blog.com/index.xml 5. Set sending frequency (Daily/Weekly) 6. Design email template 7. Copy subscription form code 8. Insert in Hugo (layouts/partials/subscribe.html) Blog Embed Code Example \u0026lt;!-- Mailchimp subscription form --\u0026gt; \u0026lt;div id=\u0026#34;mc_embed_signup\u0026#34;\u0026gt; \u0026lt;form action=\u0026#34;https://your-mailchimp-url.com/subscribe\u0026#34; method=\u0026#34;post\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;email\u0026#34; name=\u0026#34;EMAIL\u0026#34; placeholder=\u0026#34;Email address\u0026#34; required\u0026gt; \u0026lt;button type=\u0026#34;submit\u0026#34;\u0026gt;Subscribe\u0026lt;/button\u0026gt; \u0026lt;/form\u0026gt; \u0026lt;/div\u0026gt; 3. Buttondown (Developer-Friendly, Recommended) Concept Markdown-based newsletter platform with API for customization\nHow It Works 1. Connect RSS Feed to Buttondown ↓ 2. Auto-converts RSS items to Markdown emails ↓ 3. Subscribers can select tags/keywords ↓ 4. Filter subscribers by specific tags via API ↓ 5. Send only to matching subscribers Advantages ✅ Free tier: Up to 1,000 subscribers ✅ Markdown-based (developer-friendly) ✅ Powerful API (customizable) ✅ Tag-based subscriptions (keyword filtering possible) ✅ No ads ✅ Clean UI ✅ RSS import automation ✅ Privacy-focused Disadvantages ❌ Simple email design (Markdown only) ❌ Analytics weaker than Mailchimp ❌ Limited Korean support Implementation Difficulty ⭐⭐⭐ (3/5) - Increases with API usage\nKeyword Notification Example Step 1: Add tag selection to subscription form \u0026lt;form action=\u0026#34;https://buttondown.email/api/emails/embed-subscribe/YOUR_ID\u0026#34; method=\u0026#34;post\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;email\u0026#34; name=\u0026#34;email\u0026#34; placeholder=\u0026#34;Email\u0026#34; required\u0026gt; \u0026lt;label\u0026gt;Select topics of interest:\u0026lt;/label\u0026gt; \u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;tags\u0026#34; value=\u0026#34;kubernetes\u0026#34;\u0026gt; Kubernetes \u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;tags\u0026#34; value=\u0026#34;docker\u0026#34;\u0026gt; Docker \u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;tags\u0026#34; value=\u0026#34;golang\u0026#34;\u0026gt; Go \u0026lt;button type=\u0026#34;submit\u0026#34;\u0026gt;Subscribe\u0026lt;/button\u0026gt; \u0026lt;/form\u0026gt; Step 2: Selective sending via GitHub Actions name: Send Newsletter on: push: paths: - \u0026#39;content/posts/**\u0026#39; jobs: send: runs-on: ubuntu-latest steps: - name: Extract tags from post run: | TAGS=$(grep \u0026#34;^tags = \u0026#34; content/posts/*/index.md | cut -d\u0026#39;\u0026#34;\u0026#39; -f2) echo \u0026#34;POST_TAGS=$TAGS\u0026#34; \u0026gt;\u0026gt; $GITHUB_ENV - name: Send to matching subscribers run: | curl -X POST https://api.buttondown.email/v1/emails \\ -H \u0026#34;Authorization: Token ${{ secrets.BUTTONDOWN_API_KEY }}\u0026#34; \\ -d \u0026#34;subject=New Post\u0026#34; \\ -d \u0026#34;body=...\u0026#34; \\ -d \u0026#34;tag=$POST_TAGS\u0026#34; 4. SendGrid + GitHub Actions (Fully Custom) Concept Build fully customized notification system combining email sending API with CI/CD\nHow It Works 1. Write new post and Git Push ↓ 2. GitHub Actions triggered ↓ 3. Action parses Front Matter - Extract title, summary, tags ↓ 4. Query subscriber DB (Supabase/JSON file) - Match each subscriber\u0026#39;s keywords ↓ 5. Filter matching subscribers only ↓ 6. Send individual emails via SendGrid API Advantages ✅ Complete control (customize all logic) ✅ Perfect keyword notification implementation ✅ Free tier: SendGrid 100 emails/month ✅ Automation (just git push) ✅ Scalable (DB, logic freely customizable) ✅ Own subscriber data Disadvantages ❌ Development work required ❌ Maintenance burden ❌ SendGrid free tier limited (100 emails/month) ❌ Must implement subscription form and DB yourself ❌ Spam filter avoidance setup needed Implementation Difficulty ⭐⭐⭐⭐⭐ (5/5) - Most complex\nArchitecture Subscriber Database Options Option A: JSON File (Simple)\n// subscribers.json (encrypted in GitHub repository) [ { \u0026#34;email\u0026#34;: \u0026#34;user@example.com\u0026#34;, \u0026#34;keywords\u0026#34;: [\u0026#34;kubernetes\u0026#34;, \u0026#34;docker\u0026#34;], \u0026#34;active\u0026#34;: true }, { \u0026#34;email\u0026#34;: \u0026#34;dev@example.com\u0026#34;, \u0026#34;keywords\u0026#34;: [\u0026#34;golang\u0026#34;, \u0026#34;rust\u0026#34;], \u0026#34;active\u0026#34;: true } ] Option B: Supabase (Recommended)\n-- subscribers table CREATE TABLE subscribers ( id UUID PRIMARY KEY, email TEXT UNIQUE NOT NULL, keywords TEXT[], -- array type active BOOLEAN DEFAULT true, created_at TIMESTAMP DEFAULT NOW() ); GitHub Actions Workflow name: Email Notification on: push: branches: [main] paths: - \u0026#39;content/posts/**\u0026#39; jobs: notify: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: \u0026#39;18\u0026#39; - name: Extract Post Metadata id: metadata run: | # Find most recently modified post POST_FILE=$(git diff-tree --no-commit-id --name-only -r ${{ github.sha }} | grep \u0026#39;content/posts\u0026#39; | head -1) # Parse Front Matter TITLE=$(grep \u0026#34;^title = \u0026#34; $POST_FILE | cut -d\u0026#39;\u0026#34;\u0026#39; -f2) TAGS=$(grep \u0026#34;^tags = \u0026#34; $POST_FILE | sed \u0026#39;s/tags = \\[//;s/\\]//;s/\u0026#34;//g\u0026#39;) SUMMARY=$(grep \u0026#34;^summary = \u0026#34; $POST_FILE | cut -d\u0026#39;\u0026#34;\u0026#39; -f2) URL=\u0026#34;https://0andwild.github.io/$(dirname $POST_FILE | sed \u0026#39;s/content\\///\u0026#39;)\u0026#34; echo \u0026#34;title=$TITLE\u0026#34; \u0026gt;\u0026gt; $GITHUB_OUTPUT echo \u0026#34;tags=$TAGS\u0026#34; \u0026gt;\u0026gt; $GITHUB_OUTPUT echo \u0026#34;summary=$SUMMARY\u0026#34; \u0026gt;\u0026gt; $GITHUB_OUTPUT echo \u0026#34;url=$URL\u0026#34; \u0026gt;\u0026gt; $GITHUB_OUTPUT - name: Query Matching Subscribers id: subscribers run: | # Query matching subscribers from Supabase curl -X POST https://YOUR_PROJECT.supabase.co/rest/v1/rpc/get_matching_subscribers \\ -H \u0026#34;apikey: ${{ secrets.SUPABASE_KEY }}\u0026#34; \\ -H \u0026#34;Content-Type: application/json\u0026#34; \\ -d \u0026#34;{\\\u0026#34;post_tags\\\u0026#34;: \\\u0026#34;${{ steps.metadata.outputs.tags }}\\\u0026#34;}\u0026#34; \\ \u0026gt; subscribers.json - name: Send Emails via SendGrid run: | # Execute Node.js script cat \u0026gt; send-emails.js \u0026lt;\u0026lt; \u0026#39;EOF\u0026#39; const sgMail = require(\u0026#39;@sendgrid/mail\u0026#39;); const fs = require(\u0026#39;fs\u0026#39;); sgMail.setApiKey(process.env.SENDGRID_API_KEY); const subscribers = JSON.parse(fs.readFileSync(\u0026#39;subscribers.json\u0026#39;)); const title = process.env.POST_TITLE; const summary = process.env.POST_SUMMARY; const url = process.env.POST_URL; subscribers.forEach(async (subscriber) =\u0026gt; { const msg = { to: subscriber.email, from: \u0026#39;noreply@0andwild.github.io\u0026#39;, subject: `New Post: ${title}`, html: ` \u0026lt;h2\u0026gt;${title}\u0026lt;/h2\u0026gt; \u0026lt;p\u0026gt;${summary}\u0026lt;/p\u0026gt; \u0026lt;p\u0026gt;Matched keywords: ${subscriber.matched_keywords.join(\u0026#39;, \u0026#39;)}\u0026lt;/p\u0026gt; \u0026lt;a href=\u0026#34;${url}\u0026#34;\u0026gt;Read Post\u0026lt;/a\u0026gt; \u0026lt;hr\u0026gt; \u0026lt;small\u0026gt;\u0026lt;a href=\u0026#34;https://0andwild.github.io/unsubscribe?token=${subscriber.token}\u0026#34;\u0026gt;Unsubscribe\u0026lt;/a\u0026gt;\u0026lt;/small\u0026gt; ` }; await sgMail.send(msg); console.log(`Email sent to ${subscriber.email}`); }); EOF npm install @sendgrid/mail node send-emails.js env: SENDGRID_API_KEY: ${{ secrets.SENDGRID_API_KEY }} POST_TITLE: ${{ steps.metadata.outputs.title }} POST_SUMMARY: ${{ steps.metadata.outputs.summary }} POST_URL: ${{ steps.metadata.outputs.url }} Subscription Form Implementation (Hugo Shortcode) \u0026lt;!-- layouts/shortcodes/subscribe.html --\u0026gt; \u0026lt;div class=\u0026#34;subscription-form\u0026#34;\u0026gt; \u0026lt;h3\u0026gt;Subscribe to Blog\u0026lt;/h3\u0026gt; \u0026lt;form id=\u0026#34;subscribe-form\u0026#34;\u0026gt; \u0026lt;input type=\u0026#34;email\u0026#34; id=\u0026#34;email\u0026#34; placeholder=\u0026#34;Email address\u0026#34; required\u0026gt; \u0026lt;fieldset\u0026gt; \u0026lt;legend\u0026gt;Select topics of interest (get notified only for selected topics)\u0026lt;/legend\u0026gt; \u0026lt;label\u0026gt;\u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;keywords\u0026#34; value=\u0026#34;kubernetes\u0026#34;\u0026gt; Kubernetes\u0026lt;/label\u0026gt; \u0026lt;label\u0026gt;\u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;keywords\u0026#34; value=\u0026#34;docker\u0026#34;\u0026gt; Docker\u0026lt;/label\u0026gt; \u0026lt;label\u0026gt;\u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;keywords\u0026#34; value=\u0026#34;golang\u0026#34;\u0026gt; Go\u0026lt;/label\u0026gt; \u0026lt;label\u0026gt;\u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;keywords\u0026#34; value=\u0026#34;rust\u0026#34;\u0026gt; Rust\u0026lt;/label\u0026gt; \u0026lt;label\u0026gt;\u0026lt;input type=\u0026#34;checkbox\u0026#34; name=\u0026#34;keywords\u0026#34; value=\u0026#34;devops\u0026#34;\u0026gt; DevOps\u0026lt;/label\u0026gt; \u0026lt;/fieldset\u0026gt; \u0026lt;button type=\u0026#34;submit\u0026#34;\u0026gt;Subscribe\u0026lt;/button\u0026gt; \u0026lt;/form\u0026gt; \u0026lt;script\u0026gt; document.getElementById(\u0026#39;subscribe-form\u0026#39;).addEventListener(\u0026#39;submit\u0026#39;, async (e) =\u0026gt; { e.preventDefault(); const email = document.getElementById(\u0026#39;email\u0026#39;).value; const keywords = Array.from(document.querySelectorAll(\u0026#39;input[name=\u0026#34;keywords\u0026#34;]:checked\u0026#39;)) .map(cb =\u0026gt; cb.value); // Save to Supabase const response = await fetch(\u0026#39;https://YOUR_PROJECT.supabase.co/rest/v1/subscribers\u0026#39;, { method: \u0026#39;POST\u0026#39;, headers: { \u0026#39;apikey\u0026#39;: \u0026#39;YOUR_ANON_KEY\u0026#39;, \u0026#39;Content-Type\u0026#39;: \u0026#39;application/json\u0026#39; }, body: JSON.stringify({ email, keywords, active: true }) }); if (response.ok) { alert(\u0026#39;Successfully subscribed!\u0026#39;); } else { alert(\u0026#39;An error occurred.\u0026#39;); } }); \u0026lt;/script\u0026gt; \u0026lt;/div\u0026gt; Supabase Function (Keyword Matching) -- Function to find matching subscribers CREATE OR REPLACE FUNCTION get_matching_subscribers(post_tags TEXT) RETURNS TABLE(email TEXT, matched_keywords TEXT[], token TEXT) AS $$ BEGIN RETURN QUERY SELECT s.email, ARRAY( SELECT unnest(s.keywords) INTERSECT SELECT unnest(string_to_array(post_tags, \u0026#39;,\u0026#39;)) ) as matched_keywords, s.unsubscribe_token as token FROM subscribers s WHERE s.active = true AND s.keywords \u0026amp;\u0026amp; string_to_array(post_tags, \u0026#39;,\u0026#39;) -- array overlap operator ; END; $$ LANGUAGE plpgsql; Cost Analysis SendGrid: 100 emails/month free (then $19.95/month) Supabase: 500MB DB, 2GB transfer free per month GitHub Actions: 2,000 minutes/month free Total cost: Completely free (for small blogs) 5. Fully Custom (Supabase + GitHub Actions + Resend) SendGrid Alternative: Resend More developer-friendly modern email API than SendGrid\nAdvantages ✅ Free tier: 3,000 emails/month (30x more than SendGrid!) ✅ Simpler API ✅ React Email support (write emails in JSX) ✅ Better developer experience Resend Usage Example import { Resend } from \u0026#39;resend\u0026#39;; const resend = new Resend(process.env.RESEND_API_KEY); await resend.emails.send({ from: \u0026#39;blog@0andwild.github.io\u0026#39;, to: subscriber.email, subject: `New Post: ${title}`, html: `\u0026lt;p\u0026gt;${summary}\u0026lt;/p\u0026gt;\u0026lt;a href=\u0026#34;${url}\u0026#34;\u0026gt;Read\u0026lt;/a\u0026gt;` }); Comparison Table Method Free Limit Keyword Alerts Difficulty Subscriber Mgmt Custom Recommend Blogtrottr Unlimited ❌ ⭐ ❌ ❌ Testing only FeedBurner Unlimited ❌ ⭐⭐ ⚠️ ⚠️ Not recommended (discontinued) Mailchimp 2,000 ⚠️ (Pro) ⭐⭐ ✅ ⚠️ General subscriptions Buttondown 1,000 ✅ ⭐⭐⭐ ✅ ✅ For developers SendGrid + Actions 100/month ✅ ⭐⭐⭐⭐⭐ ✅ ✅✅ Advanced users Resend + Actions 3,000/month ✅ ⭐⭐⭐⭐⭐ ✅ ✅✅ Perfect control Recommended Roadmap Stage 1: Quick Start (Immediate) Mailchimp RSS Campaign\n10-minute setup All subscribers get all posts Stage 2: Improvement (After 1 week) Migrate to Buttondown\nCleaner experience Basic tag features Stage 3: Advanced Features (When needed) Resend + GitHub Actions + Supabase\nKeyword-based selective notifications Complete control Scalability Conclusion For general bloggers: → Mailchimp (easiest and most professional)\nFor developer blogs: → Buttondown (developer-friendly, provides API)\nIf keyword alerts are essential: → Resend + GitHub Actions + Supabase (fully custom)\nTo test without spending money: → Blogtrottr (30-second setup)\nQuick Start If you want actual implementation:\nStart with Mailchimp (low learning curve) Consider Buttondown when traffic grows Build custom solution when advanced features needed Keyword alerts may be overkill initially, so it\u0026rsquo;s recommended to start with basic subscriptions.\n","date":"2025-10-17T10:00:00+09:00","image":"/posts/251017_subscription_alert/featured.jpg","permalink":"/en/posts/251017_subscription_alert/","title":"Complete Guide to Blog Subscription and Email Notification Systems"},{"content":" Heading (H2) Subheading (H3) Regular text. Bold, Italic, Strikethrough\nImages Method 1: Local Image Place image file in the post folder:\n![Image description](image.jpg) Method 2: External Image URL ![Image description](https://example.com/image.jpg) Method 3: HTML Tag (with size control) \u0026lt;img src=\u0026#34;image.jpg\u0026#34; alt=\u0026#34;Image description\u0026#34; width=\u0026#34;500\u0026#34; /\u0026gt; Carousel Images (Slideshow) 16:9 21:9 Code Blocks Inline Code Use inline code format\nCode Blocks package main import \u0026#34;fmt\u0026#34; func main() { fmt.Println(\u0026#34;Hello, World!\u0026#34;) } def hello(): print(\u0026#34;Hello, World!\u0026#34;) docker run -d -p 8080:80 nginx Links Basic Link Link text\nReference Style Link Link text\nArticle Reference /docs/welcome/Open linked article Lists Unordered List Item 1 Item 2 Sub-item 2-1 Sub-item 2-2 Item 3 Ordered List First Second Third Checklist Todo 1 Completed Todo 2 Blockquote This is a blockquote. Multiple lines are supported.\nTable Item Description Note A Description A Note A B Description B Note B Embedded Links (Shortcodes) YouTube Video {{\u0026lt; youtube VIDEO_ID \u0026gt;}}\nTwitter/X {{\u0026lt; twitter user=\u0026ldquo;username\u0026rdquo; id=\u0026ldquo;tweet_id\u0026rdquo; \u0026gt;}}\nGitHub Gist {{\u0026lt; gist username gist_id \u0026gt;}}\nAlert Boxes (Blowfish Alert) {{\u0026lt; alert \u0026ldquo;circle-info\u0026rdquo; \u0026gt;}} Information alert. {{\u0026lt; /alert \u0026gt;}}\n{{\u0026lt; alert \u0026ldquo;lightbulb\u0026rdquo; \u0026gt;}} Tips and ideas. {{\u0026lt; /alert \u0026gt;}}\n{{\u0026lt; alert \u0026ldquo;triangle-exclamation\u0026rdquo; \u0026gt;}} Warning message. {{\u0026lt; /alert \u0026gt;}}\nCollapsible (Details) Click to expand Hidden content appears here.\nComments Horizontal Rule Use to create dividing lines:\nFootnotes You can add footnotes1 to text.\nChart Mermaid Diagram graph LR; A[Lemons]--\u003eB[Lemonade]; B--\u003eC[Profit] Swatches (color showcase) #64748b #3b82f6 #06b6d4 TypeIt (Ex1)\n\u0026lt;p id=\"stack-typeit-6\" class=\"stack-typeit stack-typeit-cursor\"\u003e\u0026lt;/p\u003e (Ex2)\n\u0026lt;h1 id=\"stack-typeit-7\" class=\"stack-typeit stack-typeit-cursor\"\u003e\u0026lt;/h1\u003e (Ex3)\n\u0026lt;h3 id=\"stack-typeit-8\" class=\"stack-typeit stack-typeit-cursor\"\u003e\u0026lt;/h3\u003e Youtube Lite Writing Tips:\nChange draft: true to false in front matter to publish Writing description and summary helps with SEO It\u0026rsquo;s recommended to place images in the post folder This is the footnote content.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":"2025-10-16T18:36:52+09:00","image":"/posts/251016_blowfish_markdown/featured.png","permalink":"/en/posts/251016_blowfish_markdown/","title":"Hugo Markdown Guide"},{"content":"Why I Moved to Hugo \u0026amp; GitHub Pages I decided to migrate my tech blog from Tstory to Hugo \u0026amp; GitHub Pages.\n1. Scattered Content Management As I used various note-taking tools, the content I wrote at work or while studying became scattered everywhere. Having to transfer these notes to the blog repeatedly became burdensome, which led me to neglect blog management.\n2. Markdown Compatibility Issues The markdown syntax used in my note-taking tools wasn\u0026rsquo;t fully compatible with Tstory, requiring frequent modifications when publishing posts. This was also a source of frustration.\nSpecific issues included:\nInsufficient syntax highlighting support for code blocks Table rendering errors Image path handling problems Limited mathematical expression support 3. Tstory Open API Support Ended Recently, I wanted to reorganize my study materials and post them on Tstory while redesigning the blog skin. I also planned to integrate existing note-taking tools using the Tstory Official Open API. However, I discovered that Open API support had been discontinued, and there was no longer a reason to continue using Tstory.\nBlog Platform Selection Criteria After researching various blogs and considering what approach would work best, I settled on Hugo \u0026amp; GitHub Pages based on the following criteria:\nIs it easy to set up the blog? Can it be managed with code? Is there high flexibility to add features I want? Is the build and deployment speed fast when using GitHub Pages? Is it easy to integrate with note-taking tools like Obsidian? What is Hugo? Hugo is a fast and flexible Static Site Generator written in the Go language.\nKey Features:\nFast build speed: Thousands of pages can be built in seconds Simple structure: Write content in Markdown and Hugo converts it to HTML Zero dependencies: Runs as a single binary without requiring separate runtime or database Rich theme ecosystem: Easy to apply themes for various purposes Comparing Static Site Generators Used with GitHub Pages Feature Hugo Jekyll Gatsby Next.js (SSG) VuePress Language Go Ruby React (JavaScript) React (JavaScript) Vue.js Build Speed ⚡ Very Fast (\u0026lt; 1ms/page) 🐢 Slow 🚶 Moderate 🚶 Moderate 🚶 Moderate Installation Complexity ✅ Single Binary ⚠️ Ruby environment required ⚠️ Node.js + many dependencies ⚠️ Node.js + dependencies ⚠️ Node.js + dependencies GitHub Pages Native Support ❌ (Actions required) ✅ Native support ❌ (Actions required) ❌ (Actions required) ❌ (Actions required) Learning Curve Low Low High Medium-High Medium Themes/Plugins Rich Very Rich Rich (React ecosystem) Rich (React ecosystem) Moderate Best For Blog, Docs, Portfolio Blog, GitHub default Complex web apps, Blog Complex web apps, Hybrid Technical docs Build Time (1000 pages) ~1s ~2min ~30s ~30s ~20s Why I Chose Hugo:\nOverwhelming build speed: Build time barely increases even with more content Simple setup: Focus on Markdown without complex JavaScript frameworks Zero dependencies: No environment setup issues with a single executable Rich themes: Easy to apply high-quality themes like Blowfish GitHub Pages Deployment Blogs written with Hugo are automatically built and deployed through GitHub Actions.\nDeployment Workflow Push changes to the main branch GitHub Actions automatically triggers Hugo builds the static site Built files are automatically deployed to GitHub Pages Benefits Automated deployment: Automatically deploys when you push code Version control: Track all changes through Git Free hosting: GitHub Pages is provided for free Custom domain: Can connect your desired domain HTTPS support: HTTPS is provided by default Obsidian Integration Hugo is markdown-based, making it perfectly compatible with note-taking tools like Obsidian.\nIntegration Method Set Hugo blog\u0026rsquo;s content/posts directory as Obsidian vault Write and edit posts in Obsidian When finished writing, commit \u0026amp; push through Git GitHub Actions automatically builds and deploys Benefits Consistent writing environment: Manage all notes and blog posts in the same tool Perfect markdown compatibility: No additional conversion work needed Local-first: Can write posts without internet connection Powerful linking features: Utilize Obsidian\u0026rsquo;s backlinks and graph view Terminal Commands Running Development Server hugo server Starts a local development server. By default, you can view the site at http://localhost:1313.\nKey Options:\n-D or --buildDrafts: Builds draft content as well --bind 0.0.0.0: Makes server accessible from all network interfaces --port 8080: Uses a different port instead of the default (1313) Browser automatically refreshes when files change (Live Reload) Examples:\nhugo server -D hugo server --bind 0.0.0.0 --port 8080 Production Build hugo --cleanDestinationDir Builds the static site for production. Output is generated in the public/ directory.\nKey Features:\n--cleanDestinationDir: Completely cleans the destination directory (public/) before building Removes unnecessary files from previous builds for a clean build Ensures no old versions of files remain even when filenames are changed or deleted Examples:\nhugo --cleanDestinationDir hugo --cleanDestinationDir --minify # Add file minification option Theme Information Hugo Blowfish Theme This blog uses the Blowfish theme.\nFeatures:\nProvides modern and responsive design Supports dark mode Fast loading speed and SEO optimized Multilingual support Rich customization options Configuration Files:\nconfig/_default/hugo.toml - Basic Hugo configuration config/_default/params.toml - Blowfish theme parameters config/_default/languages.en.toml - Language-specific settings config/_default/menus.en.toml - Menu configuration Conclusion The migration from Tstory to Hugo \u0026amp; GitHub Pages was a choice for a developer-friendly environment. Now I can manage my blog the same way I manage code versions, and with perfect Obsidian integration, I\u0026rsquo;ve unified the workflow from note-taking to blog posting.\nAbove all, Hugo\u0026rsquo;s fast build speed and GitHub Actions\u0026rsquo; automated deployment allow me to focus solely on writing, and I can freely customize without being bound by platform constraints.\nGoing forward, I plan to gradually migrate existing posts from Tstory while steadily adding new content.\nReferences Hugo Official Site: https://gohugo.io/ Blowfish Theme: https://blowfish.page/ Blowfish Creator: @nunocoracao Creator Blog: https://n9o.xyz/ Official Docs: https://blowfish.page/docs/ License: MIT License ","date":"2025-10-15T17:21:09+09:00","image":"/posts/251015_about_hugo/featured.png","permalink":"/en/posts/251015_about_hugo/","title":"Why I Switched to Hugo \u0026 GitHub Pages"},{"content":"1) Kind of Highlighter ES supports highlighting for search results and provides 3 methods.\nEach type below can be applied separately to each field to be highlighted\nUnified (Default Highlighter) The unified highlighter uses the Lucene unified highlighter. This highlighter breaks text into sentences and scores individual sentences as if they were documents in a corpus using the BM25 algorithm. It also supports accurate phrase and multi-term (fuzzy, prefix, regex) highlighting.\nFeatures\nBM25 algorithm-based sentence scoring Accurate phrase highlighting support Multi-term query support (fuzzy, prefix, regex) Plain The plain highlighter is best suited for highlighting simple query matches in a single field.\nTo accurately reflect the query logic, it creates a small in-memory index and re-executes the original query criteria through Lucene\u0026rsquo;s query execution planner to access low-level match information for the current document.\nThis operation is repeated for every field and every document that needs to be highlighted. For highlighting many fields in many documents with complex queries, it\u0026rsquo;s recommended to use the unified highlighter with postings or term_vector fields.\nFeatures\nSuitable for simple queries on single fields Creates in-memory index Repeatedly executes for all fields and documents FVH (Fast Vector Highlighter) The fvh highlighter uses the Lucene fast vector highlighter. This highlighter can be used on fields where term_vector is set to with_positions_offsets in the mapping.\nFeatures\nCustomizable with boundary_scanner Requires term_vector to be set to with_positions_offsets, which increases index size Can combine matches from multiple fields into one result (see matched_fields) Can assign different weights to matches at different positions, such as phrase matches being ranked higher than term matches when highlighting boosting queries that boost phrase matches over term matches Note: The fvh highlighter does not support span queries. If you need support for span queries, use another highlighter such as the unified highlighter.\n2) Offsets Strategy To create meaningful search snippets from the terms being searched, the highlighter needs to know the start and end character offsets of each word in the original text. These offsets can be obtained from:\nPostings List If index_options is set to offsets in the mapping, the unified highlighter uses this information to highlight documents without re-analyzing the text.\nThe original query is re-executed directly against the postings and matching offsets are extracted from the index to limit the collection to highlighted documents.\nThis is useful when you have large fields because there\u0026rsquo;s no need to re-analyze the text to be highlighted. It also requires less disk space than using term_vectors.\nAdvantages\nNo need to re-analyze text Useful for large fields Saves disk space compared to term_vectors Term Vectors If term_vector is set to with_positions_offsets in the mapping to provide term vector information, the unified highlighter automatically uses term vectors to highlight fields.\nBecause you have access to each document\u0026rsquo;s term dictionary, it\u0026rsquo;s fast especially when highlighting multi-term queries like prefixes or wildcards on large fields (over 1MB).\nThe fvh highlighter always uses term vectors.\nAdvantages\nFast performance on large fields (over 1MB) Efficient for multi-term queries (prefixes, wildcards) Direct access to document\u0026rsquo;s term dictionary Plain Highlighting This mode is used by unified when there are no other alternatives.\nIt creates a small in-memory index and re-executes the original query criteria through Lucene\u0026rsquo;s query execution planner to access low-level match information for the current document.\nThis operation is repeated for every field and every document that needs highlighting.\nThe plain highlighter always uses plain highlighting.\nFeatures\nUsed when there are no other options Creates in-memory index Repeatedly executes for all fields/documents References Highlighting | Elasticsearch Guide [8.13]\n","date":"2024-02-07T21:24:29+09:00","image":"/posts/240207_es/featured.png","permalink":"/en/posts/240207_es/","title":"Elasticsearch Highlighting Techniques"},{"content":"ElasticSearch Pagination: 3 Options 1. From/Size Pagination Uses from + size = offset method to load data into memory on-demand. Supports up to 10,000 results.\nThe index.max_result_window option can be used to load more than 10,000 results, but this is not recommended.\nFeatures\nSimple implementation Limited to 10,000 results maximum Performance degradation with deep pagination 2. Search After Overcomes the pagination limitation of 10,000 results. Similar to typical cursor-based approaches.\nUses the sort condition field of search results as a key value to retrieve subsequent results.\nDrawbacks\nUsing search-after alone may result in inconsistent responses if indexing is updated during pagination.\nUsing with PIT (Point In Time)\nTo address this, use PIT (Point In Time)\nPOST /my-index-000001/_pit?keep_alive=1m The above request creates a snapshot of the index at the current point in time, which can then be used with the id value as follows:\n{ \u0026#34;query\u0026#34;: {}, \u0026#34;size\u0026#34;: 100, \u0026#34;sort\u0026#34;: { \u0026#34;my_sort\u0026#34;: \u0026#34;desc\u0026#34; }, \u0026#34;search_after\u0026#34;: {}, \u0026#34;pit\u0026#34;: { \u0026#34;id\u0026#34;: \u0026#34;{{pit_value}}\u0026#34; } } Even if there are changes to the index, results are returned based on the snapshot.\nkeep_alive represents the validity period of the PIT. It\u0026rsquo;s recommended to manage PIT based on the latest point in time.\nFeatures\nCan paginate more than 10,000 results Cursor-based approach Guarantees consistent results when used with PIT Efficient for deep pagination 3. Scroll Note: According to ES official documentation, Search After method is recommended instead of Scroll method\nFeatures\nFor bulk data extraction Not suitable for real-time search Search After + PIT combination is now recommended References Elasticsearch Pagination Techniques: SearchAfter, Scroll, Pagination \u0026amp; PIT Elasticsearch Search After Performance Check Paginate search results | Elasticsearch Guide [7.17] Point in time API | Elasticsearch Guide [7.17] Scroll API | Elasticsearch Guide [7.17] ","date":"2024-02-06T21:16:24+09:00","image":"/posts/240206_es/featured.png","permalink":"/en/posts/240206_es/","title":"Elasticsearch Pagination Technique"},{"content":"Elasticsearch Autocomplete Methods Edge N-Gram Tokenizer Configuration\nmin_gram: 1 max_gram: 10 token_chars: letter Appropriate Use Cases\nWhen the order of terms is not important When the starting point and position of tokens are not important Edge N-Gram Token Filter Configuration\nmin_gram: 1 max_gram: 10 Appropriate Use Cases\nWhen the order of terms is not important When the starting point and position of tokens are not important Index_prefixes Parameter Configuration\nmin_chars: 1 max_chars: 10 Appropriate Use Cases\nSame as N-gram\nHowever, one difference is that the latter puts generated tokens into an additional field\nSearch-as-you-type Data Type Configuration\nmax_shingle_size: 3 Generated Tokens (Supported Sub-fields)\nExample: \u0026ldquo;real panda blog\u0026rdquo;\n._2gram additional field: real panda, panda blog (shingle token filter applied) ._3gram additional field: real panda blog (shingle token filter applied) ._index_prefix additional field: r, re, rea, real, \u0026ldquo;real \u0026ldquo;, real p, real pa, real pan, real pand, real panda, \u0026ldquo;real panda \u0026ldquo;, real panda b, real panda bl, real panda blo, real panda blog, p, pa, pan, pand, panda, \u0026ldquo;panda \u0026ldquo;, panda b, panda bl, panda blo, panda blog, \u0026ldquo;panda blog \u0026ldquo;, b, bl, blo, blog, \u0026ldquo;blog \u0026quot; (Applied to ._3gram field with n-gram max of 3) The most efficient query method recommended by ES is a multi-match query of bool_prefix type targeting the root field and its shingle sub-fields.\nThis query can match query terms in any order, but gives a higher score when the terms appear in order in the shingle sub-field of the document.\nIf you want to search for exact order matching of query terms and document terms, or use other properties of phrase queries, you can use the match_phrase_prefix query on the root field. This is also the case when the last term (not a prefix) must match exactly. However, this may be less efficient than using the match_bool_prefix query.\nshingle token filter defaults to 2\nAppropriate Use Cases\nWhen the order of terms is important When the starting point and position of tokens are important If no analyzer is configured during indexing, the standard analyzer is applied by default.\nSuggester API In-memory (Completion Suggester, Context Suggester) Completion suggester provides auto-complete and search-as-you-type functionality. (Does not support typo correction)\nCompletion suggester is optimized for speed and responds immediately to user typing.\nHowever, building and storing in in-memory manner incurs significant resource costs.\nTerm Suggester Used to provide search results based on suggested words when there are no results for the text entered by the user\nSuggests words for misspellings\nWords are suggested using edit distance. Edit distance is a metric that measures how similar one string is to another string.\nEdit distance is typically measured through operations of adding, deleting, and substituting each word.\nFor example, to change the string \u0026ldquo;tamming test\u0026rdquo; to \u0026ldquo;taming text\u0026rdquo;, you need 1 operation to delete m and 1 operation to change s to x. Therefore, the edit distance is 2.\nIf there is no term matching the indexed data, term suggest will recommend similar words.\nIn the results, text represents the suggested character, and score indicates how close the suggested text is to the original.\nAlgorithms\nElasticsearch uses Levenshtein edit distance measurement or Jaro-Winkler edit distance measurement as edit distance calculation algorithms. Korean Processing\nFor Korean, data is not recommended even when using term suggest. This is fundamentally because the Korean Unicode system is complex. Korean typo handling is possible through ICU analyzer. ICU analyzer is specifically developed for internationalization and has built-in functions to decompose and combine Korean graphemes. However, for sophisticated features such as typo correction, Korean-English conversion, and autocomplete, it is recommended to develop separate plugins. (e.g., JavaCafe plugin) Phrase Suggester (Phrase Suggestion) Completion Suggester Autocomplete suggestion, predicts and shows search terms using autocomplete before the user completes input\nContext Suggester Contextual suggestion\nReferences Search-as-you-type, N-grams \u0026amp; Suggesters in Elasticsearch\n","date":"2024-02-04T20:59:35+09:00","image":"/posts/240204_es/featured.png","permalink":"/en/posts/240204_es/","title":"Elasticsearch Autocomplete Search Methods"},{"content":"Elasticsearch Query Processing Sequence Summary Filtered Query affects both the final search and aggregation results, but PostFiltered Query only affects the final search results and does not affect aggregation.\nQuery Processing Sequence SearchRequest → (Filtered) → Query → (PostFilter) → Result → RescoreQuery ↓ Aggregation → AggregationResult Filter Query Example { \u0026#34;query\u0026#34;: { \u0026#34;filtered\u0026#34;: { \u0026#34;filter\u0026#34;: { \u0026#34;term\u0026#34;: { \u0026#34;location\u0026#34;: \u0026#34;denver\u0026#34; } } } } } Filter Query affects both search results and Aggregation.\nPostFilter Query Example { \u0026#34;post_filter\u0026#34;: { \u0026#34;term\u0026#34;: { \u0026#34;location\u0026#34;: \u0026#34;denver\u0026#34; } } } PostFilter Query only affects search results and does not affect Aggregation results.\nRescore Query Parameters window_size: The number of top results to rescore per shard score_mode: The method of combining main query score and rescore query scores ","date":"2024-02-03T21:45:06+09:00","image":"/posts/240203_es_query/featured.png","permalink":"/en/posts/240203_es_query/","title":"Elasticsearch Query Processing Sequence"},{"content":"\nToken Filter Token Filter receives the token stream generated by the Tokenizer and performs the role of adding, removing, or modifying tokens.\nWord Delimiter Graph Filter The word delimiter graph filter is designed to remove punctuation from complex identifiers like product IDs or part numbers. For these use cases, it is recommended to use it with the keyword tokenizer.\nWhen separating hyphenated words like wi-fi, it\u0026rsquo;s better not to use the word delimiter graph filter. Since users search both with and without hyphens, it\u0026rsquo;s better to use the synonym graph filter.\nConversion Rules Tokens are split in the following ways:\nSplit tokens at non-alphanumeric characters: Super-Duper → Super, Duper Remove leading and trailing delimiters: XL---42+'Autocoder' → XL, 42, Autocoder Split at case transitions: PowerShot → Power, Shot Split at letter-number transitions: XL500 → XL, 500 Remove English possessives: Neil's → Neil API Usage Example GET /_analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;keyword\u0026#34;, \u0026#34;filter\u0026#34;: [\u0026#34;word_delimiter_graph\u0026#34;], \u0026#34;text\u0026#34;: \u0026#34;Neil\u0026#39;s-Super-Duper-XL500--42+AutoCoder\u0026#34; } // Result -\u0026gt; [ Neil, Super, Duper, XL, 500, 42, Auto, Coder ] Custom Analyzer Configuration PUT /my-index-000001 { \u0026#34;settings\u0026#34;: { \u0026#34;analysis\u0026#34;: { \u0026#34;analyzer\u0026#34;: { \u0026#34;my_analyzer\u0026#34;: { \u0026#34;tokenizer\u0026#34;: \u0026#34;keyword\u0026#34;, \u0026#34;filter\u0026#34;: [\u0026#34;word_delimiter_graph\u0026#34;] } } } } } Configurable Parameters adjust_offsets Default: true When true, the filter adjusts the starting point of split tokens or catenated tokens to better reflect their actual position in the token stream If using filters like trim that change token length without changing offset, this should be set to false when used together catenate_all Default: false When true, the filter generates catenated tokens for alphanumeric chains separated by non-alphanumeric delimiters Example: super-duper-xl-500 → [ superduperxl500, super, duper, xl, 500 ] catenate_numbers Default: false When true, the filter generates catenated tokens for numeric character chains separated by non-alphabetic delimiters Example: 01-02-03 → [ 010203, 01, 02, 03 ] catenate_words Default: false When true, the filter generates catenated tokens for alphabetic character chains separated by non-alphabetic delimiters Example: super-duper-xl → [ superduperxl, super, duper, xl ] ⚠️ Caution when using Catenate parameters\nSetting these parameters to true generates multi-position tokens that are not supported in indexing\nIf these parameters are true, either don\u0026rsquo;t use this filter in the index analyzer or use the flatten_graph filter after this filter to make the token stream suitable for indexing\nWhen used in search analysis, catenated tokens can cause issues with match_phrase queries and other queries that rely on matching token positions. If you plan to use these queries, you should not set these parameters to true.\ngenerate_number_parts Default: true When true, the filter includes tokens composed of numbers in the output When false, the filter excludes these tokens from the output generate_word_parts Default: true When true, the filter includes tokens composed of alphabetic characters in the output When false, excludes these tokens from the output ignore_keywords Default: false When true, the filter skips tokens with the keyword attribute set to true preserve_original Default: false When true, the filter includes the original version of split tokens in the output This original version includes non-alphanumeric delimiters Example: super-duper-xl-500 → [ super-duper-xl-500, super, duper, xl, 500 ] ⚠️ Caution when using preserve_original parameter\nSetting this parameter to true generates multi-position tokens that are not supported in indexing\nIf this parameter is true, either don\u0026rsquo;t use this filter in the index analyzer or use the flatten_graph filter after this filter to make the token stream suitable for indexing\nprotected_words (Optional, array of strings)\nAn array of tokens that the filter will not split protected_words_path (Optional, string)\nPath to a file containing a list of tokens that the filter will not split This path must be an absolute or relative path to the config location, and the file must be UTF-8 encoded Each token in the file must be separated by a newline split_on_case_change (Optional, Boolean)\nDefault: true When true, the filter splits tokens at case transitions Example: camelCase → [ camel, Case ] split_on_numerics (Optional, Boolean)\nDefault: true When true, the filter splits tokens at letter-number transitions Example: j2se → [ j, 2, se ] stem_english_possessive (Optional, Boolean)\nDefault: true When true, the filter removes English possessives (\u0026rsquo;s) from the end of each token Example: O'Neil's → [ O, Neil ] type_table (Optional, array of strings)\nAn array of custom type mappings for characters This allows mapping non-alphanumeric characters as numeric or alphanumeric to prevent splitting at those characters Example\n[ \u0026#34;+ =\u0026gt; ALPHA\u0026#34;, \u0026#34;- =\u0026gt; ALPHA\u0026#34; ] The above array maps plus (+) and hyphen (-) characters as alphanumeric, so they are not treated as delimiters.\nSupported Types\nALPHA (Alphabetical) ALPHANUM (Alphanumeric) DIGIT (Numeric) LOWER (Lowercase alphabetical) SUBWORD_DELIM (Non-alphanumeric delimiter) UPPER (Uppercase alphabetical) type_table_path (Optional, string)\nPath to a custom type mapping file Example\n# Map the $, %, \u0026#39;.\u0026#39;, and \u0026#39;,\u0026#39; characters to DIGIT # This might be useful for financial data. $ =\u0026gt; DIGIT % =\u0026gt; DIGIT . =\u0026gt; DIGIT \\u002C =\u0026gt; DIGIT # in some cases you might not want to split on ZWJ # this also tests the case where we need a bigger byte[] # see https://en.wikipedia.org/wiki/Zero-width_joiner \\u200D =\u0026gt; ALPHANUM This file path must be an absolute or relative path to the config location, and the file must be UTF-8 encoded. Each mapping in the file is separated by a newline.\nUsage Cautions It\u0026rsquo;s not recommended to use the word_delimiter_graph filter with tokenizers that remove punctuation, such as the Standard tokenizer. This may prevent the word_delimiter_graph filter from splitting tokens correctly.\nIt may also interfere with the filter\u0026rsquo;s configurable parameters like catenate_all or preserve_original. Instead, it\u0026rsquo;s recommended to use the keyword or whitespace tokenizer.\n","date":"2024-02-02T21:30:06+09:00","image":"/posts/240202_es_analyzer3/featured.png","permalink":"/en/posts/240202_es_analyzer3/","title":"Elasticsearch Token Filter"},{"content":"\nTokenizer A Tokenizer receives a character stream and breaks it into individual tokens (usually tokenized by each word).\nThe most commonly used whitespace tokenizer splits and tokenizes based on whitespace.\nWhitespace Tokenizer Example // character streams Quick brown fox! // Result [Quick, brown, fox!] Tokenizer\u0026rsquo;s Responsibilities Ordering and position of each term (used in phrase or word proximity queries) Start and end characters of the original word before transformation are used for search snippet highlighting Token type: \u0026lt;ALPHANUM\u0026gt;, \u0026lt;HANGUL\u0026gt;, \u0026lt;NUM\u0026gt;, etc. (simple analyzers only provide word token types) Word Oriented Tokenizer The tokenizers below are used to tokenize full text into individual words.\nStandard Tokenizer The standard tokenizer performs tokenization based on the Unicode Text Segmentation algorithm.\nConfiguration max_token_length: Splits and tokenizes at strings exceeding the specified length Default = 255 Conversion Example POST _analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;standard\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;The 2 QUICK Brown-Foxes jumped over the lazy dog\u0026#39;s bone.\u0026#34; } // Result -\u0026gt; [ The, 2, QUICK, Brown, Foxes, jumped, over, the, lazy, dog\u0026#39;s, bone ] max_token_length Application Example PUT my-index-000001 { \u0026#34;settings\u0026#34;: { \u0026#34;analysis\u0026#34;: { \u0026#34;analyzer\u0026#34;: { \u0026#34;my_analyzer\u0026#34;: { \u0026#34;tokenizer\u0026#34;: \u0026#34;my_tokenizer\u0026#34; } }, \u0026#34;tokenizer\u0026#34;: { \u0026#34;my_tokenizer\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;standard\u0026#34;, \u0026#34;max_token_length\u0026#34;: 5 } } } } } POST my-index-000001/_analyze { \u0026#34;analyzer\u0026#34;: \u0026#34;my_analyzer\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;The 2 QUICK Brown-Foxes jumped over the lazy dog\u0026#39;s bone.\u0026#34; } // Result -\u0026gt; [ The, 2, QUICK, Brown, Foxes, jumpe, d, over, the, lazy, dog\u0026#39;s, bone ] Letter Tokenizer The letter tokenizer splits and tokenizes at non-letter characters. This is suitable for European languages (English-speaking regions) but not for Asian languages, especially languages where words are not separated by spaces.\nConversion Example POST _analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;letter\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;The 2 QUICK Brown-Foxes jumped over the lazy dog\u0026#39;s bone.\u0026#34; } // Result -\u0026gt; [ The, QUICK, Brown, Foxes, jumped, over, the, lazy, dog, s, bone ] Lowercase Tokenizer The lowercase tokenizer splits and tokenizes at non-letter characters like the letter tokenizer, and additionally converts all strings to lowercase. Functionally, it is efficient as it performs both the letter tokenizer\u0026rsquo;s function and lowercase conversion in one operation.\nConversion Example POST _analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;lowercase\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;The 2 QUICK Brown-Foxes jumped over the lazy dog\u0026#39;s bone.\u0026#34; } // Result -\u0026gt; [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ] Whitespace Tokenizer The whitespace tokenizer performs tokenization based on whitespace characters.\nConfiguration max_token_length: Splits and tokenizes at strings exceeding the specified length Default = 255 Conversion Example POST _analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;whitespace\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;The 2 QUICK Brown-Foxes jumped over the lazy dog\u0026#39;s bone.\u0026#34; } // Result -\u0026gt; [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog\u0026#39;s, bone. ] UAX URL Email Tokenizer The uax_url_email tokenizer is identical to the standard tokenizer, but with one difference: it recognizes URLs or email addresses and treats them as a single token.\nConfiguration max_token_length: Splits and tokenizes at strings exceeding the specified length Default = 255 Conversion Example POST _analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;uax_url_email\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;Email me at john.smith@global-international.com\u0026#34; } // Result -\u0026gt; [ Email, me, at, john.smith@global-international.com ] // If using standard tokenizer for the above example, the result would be: // Result -\u0026gt; [ Email, me, at, john.smith, global, international.com ] Configuration Example PUT my-index-000001 { \u0026#34;settings\u0026#34;: { \u0026#34;analysis\u0026#34;: { \u0026#34;analyzer\u0026#34;: { \u0026#34;my_analyzer\u0026#34;: { \u0026#34;tokenizer\u0026#34;: \u0026#34;my_tokenizer\u0026#34; } }, \u0026#34;tokenizer\u0026#34;: { \u0026#34;my_tokenizer\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;uax_url_email\u0026#34;, \u0026#34;max_token_length\u0026#34;: 5 } } } } } POST my-index-000001/_analyze { \u0026#34;analyzer\u0026#34;: \u0026#34;my_analyzer\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;john.smith@global-international.com\u0026#34; } // Result (ignores email format, max_token_length takes priority) // [ john, smith, globa, l, inter, natio, nal.c, om ] Classic Tokenizer The classic tokenizer performs grammar-based tokenization and is good for English documents. This tokenization method has special handling for abbreviations, company names, email addresses, and internet hostnames. However, these rules don\u0026rsquo;t always work and don\u0026rsquo;t work well for languages other than English.\nTokenizing Rules Splits words at most punctuation marks, removing the punctuation. However, dots not followed by whitespace are considered part of the token. Splits words at hyphens, but if the word contains a hyphen, it recognizes it as a product number and doesn\u0026rsquo;t split it (e.g., 123-23). Email and internet hostnames are considered as a single token. Configuration max_token_length: Splits and tokenizes at strings exceeding the specified length Default = 255 Conversion Example POST _analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;classic\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;The 2 QUICK Brown-Foxes jumped over the lazy dog\u0026#39;s bone.\u0026#34; } // Result -\u0026gt; [ The, 2, QUICK, Brown, Foxes, jumped, over, the, lazy, dog\u0026#39;s, bone ] Thai Tokenizer The thai tokenizer tokenizes Thai text into words. It uses the Thai segmentation algorithm included in Java. If the input text contains strings in languages other than Thai, the standard tokenizer is applied to those strings.\n⚠️ Warning: This tokenization method may not be supported in some JREs. This tokenization method is known to work with Sun/Oracle and OpenJDK. If considering full portability for your application, it\u0026rsquo;s recommended to use the ICU tokenizer instead\nConversion Example POST _analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;thai\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;การที่ได้ต้องแสดงว่างานดี\u0026#34; } // Result -\u0026gt; [ การ, ที่, ได้, ต้อง, แสดง, ว่า, งาน, ดี ] References Tokenizer reference | Elasticsearch Guide [8.8] ","date":"2024-02-02T21:23:06+09:00","image":"/posts/240202_es_analyzer2/featured.png","permalink":"/en/posts/240202_es_analyzer2/","title":"Elasticsearch Tokenizer"},{"content":"\nCharacter Filters Character Filter is a process that preprocesses the input string before the tokenizer stage.\nIt adds, removes, or replaces characters in strings.\nElasticsearch provides the following basic Character Filters and also allows custom filters.\nHTML Strip Character Filter Converts HTML-formatted input values into decoded values.\nConversion Example GET /_analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;keyword\u0026#34;, \u0026#34;char_filter\u0026#34;: [\u0026#34;html_strip\u0026#34;], \u0026#34;text\u0026#34;: \u0026#34;\u0026lt;p\u0026gt;I\u0026amp;apos;m so \u0026lt;b\u0026gt;happy\u0026lt;/b\u0026gt;!\u0026lt;/p\u0026gt;\u0026#34; } // Result -\u0026gt; [ \\nI\u0026#39;m so happy!\\n ] Application Method PUT /my-index-000001 { \u0026#34;settings\u0026#34;: { \u0026#34;analysis\u0026#34;: { \u0026#34;analyzer\u0026#34;: { \u0026#34;my_analyzer\u0026#34;: { \u0026#34;tokenizer\u0026#34;: \u0026#34;keyword\u0026#34;, \u0026#34;char_filter\u0026#34;: [\u0026#34;html_strip\u0026#34;] } } } } } Mapping Character Filter The Mapping Character Filter converts the input string to the corresponding key\u0026rsquo;s value when it matches a character specified as a key.\nThe matching method is greedy, converting to the most matched pattern, and the replacement value can be an empty string.\nConversion Example GET /_analyze { \u0026#34;tokenizer\u0026#34;: \u0026#34;keyword\u0026#34;, \u0026#34;char_filter\u0026#34;: [ { \u0026#34;type\u0026#34;: \u0026#34;mapping\u0026#34;, \u0026#34;mappings\u0026#34;: [ \u0026#34;٠ =\u0026gt; 0\u0026#34;, \u0026#34;١ =\u0026gt; 1\u0026#34;, \u0026#34;٢ =\u0026gt; 2\u0026#34;, \u0026#34;٣ =\u0026gt; 3\u0026#34;, \u0026#34;٤ =\u0026gt; 4\u0026#34;, \u0026#34;٥ =\u0026gt; 5\u0026#34;, \u0026#34;٦ =\u0026gt; 6\u0026#34;, \u0026#34;٧ =\u0026gt; 7\u0026#34;, \u0026#34;٨ =\u0026gt; 8\u0026#34;, \u0026#34;٩ =\u0026gt; 9\u0026#34; ] } ], \u0026#34;text\u0026#34;: \u0026#34;My license plate is ٢٥٠١٥\u0026#34; } // Result -\u0026gt; [ My license plate is 25015 ] Pattern Replace Character Filter The pattern_replace filter converts strings matching a regular expression to a specified string.\n⚠️ Warning: Regular expressions follow Java regex, and poorly written regex can cause performance degradation or StackOverflow errors, and may suddenly terminate running nodes.\nParameters pattern: Java regular expression replacement: String to replace with flags: Java regular expression flags, separated by | (e.g., \u0026ldquo;CASE_INSENSITIVE|COMMENTS\u0026rdquo;) Conversion Example PUT my-index-000001 { \u0026#34;settings\u0026#34;: { \u0026#34;analysis\u0026#34;: { \u0026#34;analyzer\u0026#34;: { \u0026#34;my_analyzer\u0026#34;: { \u0026#34;tokenizer\u0026#34;: \u0026#34;standard\u0026#34;, \u0026#34;char_filter\u0026#34;: [\u0026#34;my_char_filter\u0026#34;] } }, \u0026#34;char_filter\u0026#34;: { \u0026#34;my_char_filter\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;pattern_replace\u0026#34;, \u0026#34;pattern\u0026#34;: \u0026#34;(\\\\d+)-(?=\\\\d)\u0026#34;, \u0026#34;replacement\u0026#34;: \u0026#34;$1_\u0026#34; } } } } } POST my-index-000001/_analyze { \u0026#34;analyzer\u0026#34;: \u0026#34;my_analyzer\u0026#34;, \u0026#34;text\u0026#34;: \u0026#34;My credit card is 123-456-789\u0026#34; } // Result -\u0026gt; [ My, credit, card, is, 123_456_789 ] References Character filters reference | Elasticsearch Guide [8.8] ","date":"2024-02-02T21:00:06+09:00","image":"/posts/240202_es_analyzer1/featured.png","permalink":"/en/posts/240202_es_analyzer1/","title":"Elasticsearch Character Filter"},{"content":"Elasticsearch is a Java-based open-source distributed search engine built on Apache Lucene.\nThrough Elasticsearch, you can use the Lucene library (an information retrieval library developed in Java) independently, and perform near real-time storage, search, and analysis of massive amounts of data.\nElasticsearch can be used standalone for search purposes, or as part of the ELK (Elasticsearch / Logstash / Kibana) stack.\nELK Stack Components Elasticsearch: Searches and aggregates data received from Logstash to obtain necessary information Logstash: Collects, aggregates, and parses logs or transaction data from various sources (DB, CSV files, etc.) and delivers them to Elasticsearch Kibana: Visualizes and monitors data through Elasticsearch\u0026rsquo;s fast search capabilities 💡 The ELK stack is primarily used to collect scattered logs from load-balanced WAS into one place, quickly search for desired data, and visualize it for monitoring purposes.\nElasticsearch and RDB Terminology Comparison Elasticsearch 7.0+ Allows Only One Type Per Index The reason is that Elasticsearch uses the same Lucene fields for types within a single index (DB). Therefore, even if types are different, fields with the same name are not independent, which can cause various problems. As a result, it was modified so that one index can only have one type.\nComparison with RDB In the case of RDB\nA single DB can have multiple tables, and columns with the same name in each table do not affect each other. In the case of Elasticsearch\nIf there are fields (=columns) with the same name in each type (=table) within one index (=DB), those fields are not independent and are stored in the same Lucene field, requiring the same definition. Elasticsearch Architecture Cluster A cluster is the largest system unit in Elasticsearch, consisting of a collection of nodes with at least one or more nodes.\nDifferent clusters are maintained as independent systems that cannot access or exchange data Multiple servers can form a single cluster Multiple clusters can exist on a single server Node A node is a single server included in a cluster that stores data and participates in the cluster\u0026rsquo;s indexing and search capabilities. Nodes are classified according to their roles as follows:\nMaster-eligible Node A node that can be selected as a master to control the cluster\nCreating and deleting indices Tracking and managing cluster nodes Selecting shards to allocate when data is input Data Node A node where data (Documents) is stored and where shards, the spaces for distributed data storage, are placed\nPerforms data operations such as CRUD, indexing, searching, and statistics Requires significant resources (CPU, memory, etc.) Requires monitoring and should be separated from master nodes Ingest Node Executes pre-processing pipelines such as data transformation\nCoordination Only Node A node that receives user requests and distributes them in a round-robin manner\nForwards cluster-related requests to the master node Forwards data-related requests to data nodes Performs load balancing role Index / Shard / Replication Index A concept corresponding to a database in RDB\nShard Data indexed within an index does not exist as a single unit but is divided into multiple parts. A single index is split into multiple shards for scale-out purposes.\n💡 Shards are divided into primary shards and replica shards.\nPrimary Shard\nThe original data Data update requests are sent to the primary shard Updated content is replicated to replica shards Replica Shard\nA copy of the primary shard Used as a replacement when the original data is lost, performing a role in overcoming failures By default, assigned to a different node than the primary shard Segment A segment is a data structure designed for fast document search in Elasticsearch and is a physical file containing shard data.\nSegment Characteristics Each shard consists of multiple segments, enabling efficient search through distributed processing of search requests When searching a shard, each segment is searched first, results are combined, and the final result is returned as the shard\u0026rsquo;s result Data indexed within segments is stored in an inverted index structure, making search speed very fast Segment Creation Process Creating a new segment for every request would generate too many segments, so an in-memory buffer is used to prevent this.\nFlush: When the content accumulated in the in-memory buffer reaches a certain time or the buffer is full, flush is performed and segments are created in the system cache\nData becomes searchable from this point In this state, the segment is stored in the system cache, not on disk Commit: After a certain time, segments are stored on physical disk through commit\nMerge: Stored segments are merged into one over time\nBy merging segments into one, the number of segments to search decreases, improving search performance ","date":"2024-02-01T21:07:14+09:00","image":"/posts/240201_es/featured.png","permalink":"/en/posts/240201_es/","title":"What is ElasticSearch?"},{"content":"Clustering SELECT * FROM user_signups WHERE country = \u0026#39;Lebanon\u0026#39; AND registration_date = \u0026#39;2023-12-01\u0026#39; Through clustering, BigQuery can perform less work in accessing data, thereby increasing query speed. However, before clustering, it\u0026rsquo;s good to consider the amount of data in the table and whether the cost of processing clustering might be worse.\nFor example, if a BigQuery column-based table has only 10 rows of data, you should recognize that the cost of clustering will be higher than the cost of doing a full scan.\nQuoting a former Google engineer, if the data group for clustering is less than 100MB, doing a full scan might be better than clustering.\nReference: Google BigQuery clustered table not reducing query size\nImportant Note\nAdditionally, if you don\u0026rsquo;t filter on the clustered base column during query time, it provides no help to query performance whatsoever.\nExample of Creating Clustered Tables CREATE TABLE `myproject.mydataset.clustered_table` ( registration_date DATE, country STRING, tier STRING, username STRING ) CLUSTER BY country; Clustering Features\nCan cluster up to 4 columns maximum Unlike partitioning, not limited to INT64 and DATE types only Can also use types like STRING and GEOGRAPHY Combine Clustering with Partitioning Using partitioning and clustering together enables more efficient data access.\nCombination Strategy\nPartitioning: Date-based data division Clustering: Additional sorting within partitions Query performance and cost optimization References Add One Line of SQL to Optimise Your BigQuery Tables Use the Partitions, Luke! A Simple and Proven Way to Optimise Your SQL Queries ","date":"2023-12-20T21:50:16+09:00","image":"/posts/231220_bigquery/featured.png","permalink":"/en/posts/231220_bigquery/","title":"BigQuery Clustering Optimization"},{"content":"BigQuery Features Column-based Database Unlike typical RDBs that store data in row units, when accessing data in a specific column, it scans only the column file you\u0026rsquo;re looking for without scanning the entire row.\nAdvantageous for analytical database (OLAP) operations that read only specific columns to count or calculate statistics.\nAdvantages\nFast queries with column-level scanning Optimized for analytical queries Storage efficiency Data Processing Architecture Colossus (Distributed Storage)\nGoogle\u0026rsquo;s cluster-level file system succeeding Google File System (GFS) Provides storage at the bottom layer Communicates with compute nodes through Jupiter, a TB-level network Compute Layers (Leaf, Mixer1, Mixer0)\nProcess data read from Colossus without disks Each layer passes data up to the layer above High-speed computation through distributed parallel processing No Key, No Index No concept of keys and indexes. Full scan only\nFeatures\nNo need for index management Performance achieved through column-based scanning Optimized for large-scale data analysis No Update, Delete Only additions are allowed for performance, and once data is entered, it cannot be modified or deleted.\nIf data is entered incorrectly, the table must be deleted and recreated.\nConstraints\nOnly INSERT supported UPDATE/DELETE not supported Requires recreation when modifying data Eventual Consistency Data is replicated to 3 data centers, so it may not be immediately available for querying after writing.\nFeatures\nHigh availability through triple replication Eventual consistency guarantee Reads may not be immediately available after writes References BigQuery Performance/Cost Tips How to Use BigQuery UNNEST, ARRAY, STRUCT Introduction to Google Big Data Platform BigQuery Architecture ","date":"2023-12-18T21:37:29+09:00","image":"/posts/231218_bigquery/featured.png","permalink":"/en/posts/231218_bigquery/","title":"What is BigQuery?"},{"content":"QueryString Processing Methods Comparison Let\u0026rsquo;s compare two methods for handling QueryString in Spring.\nMethod 1: Receiving as Object (ParameterObject) @Operation(tags = {\u0026#34;swagger\u0026#34;}) @GetMapping(\u0026#34;/hello/parameters1\u0026#34;) public ResponseEntity\u0026lt;List\u0026lt;ResponseTest\u0026gt;\u0026gt; parameterObjectTest(ParameterObjectReq req) { ResponseTest response = new ResponseTest(req.email(), req.password(), req.occupation()); return ResponseEntity.ok(List.of(response)); } Method 2: Receiving as Individual Parameters (@RequestParam) @Operation(tags = {\u0026#34;swagger\u0026#34;}) @GetMapping(\u0026#34;/hello/parameters2\u0026#34;) public ResponseEntity\u0026lt;List\u0026lt;ResponseTest\u0026gt;\u0026gt; parameterObjectTest2( @RequestParam(value = \u0026#34;email\u0026#34;) String email, @RequestParam(value = \u0026#34;pw\u0026#34;) String password, @RequestParam(value = \u0026#34;oq\u0026#34;) OccupationStatus status ) { ResponseTest response = new ResponseTest(email, password, status); return ResponseEntity.ok(List.of(response)); } Model Definition ParameterObjectReq (Request DTO)\npublic record ParameterObjectReq( String email, String password, OccupationStatus occupation ) { } OccupationStatus (Enum)\npublic enum OccupationStatus { STUDENT, EMPLOYEE, UNEMPLOYED } Differences Between the Two Approaches While @RequestParam is commonly used for receiving QueryString requests, when there are many parameters, you can receive the QueryString as an Object like in the first method.\n@RequestParam vs ParameterObject @RequestParam: By default set to required = true, making request values mandatory. ParameterObject: Spring automatically binds QueryString to the object\u0026rsquo;s field values without any special annotation. However, since required is not set by default, null values can be passed. ParameterObject vs @RequestParam Conversion in Springdoc Using ParameterObject Using @RequestParam Using @ParameterObject Annotation To make Springdoc convert ParameterObject like when using @RequestParam and display the Required status, configure as follows.\nCode Example @ParameterObject public record ParameterObjectReq( @NotNull String email, @NotNull String password, OccupationStatus occupation ) { } @ParameterObject is a Springdoc annotation. When receiving multiple QueryStrings as an Object, specifying it on the class will make it recognize and convert like @RequestParam.\nJSR-303 Support Springdoc supports JSR-303 and allows the following validation annotations:\n@NotNull @Min, @Max @Size Other validation annotations According to Springdoc official documentation\nThis library supports\nOpenAPI 3 Spring-boot (v1, v2 and v3) JSR-303, specifically for @NotNull, @Min, @Max, and @Size Swagger-ui OAuth 2 GraalVM native images Conversion Result The spec file is written so that ParameterObject is also recognized as @RequestParam, and occupation without @NotNull is displayed as optional in the Required field.\nLeft Image: When @ParameterObject is specified, Springdoc recognizes it and converts to proper spec\nSwagger2 → Swagger3 Annotations Swagger2 Swagger3 Description @Api @Tag Displays swagger resource at class level (for grouping)\nname : Tag name\ndescription : Tag description @ApiIgnore @Parameter(hidden = true)\n@Operation(hidden = true)\n@Hidden This annotation allows hiding parameters in swagger-ui.\nFor requestBody or ResponseBody, use\n@JsonProperty(access = JsonProperty.Access.READ_ONLY) @ApiImplicitParam @Parameter Configuration and resource display for single RequestParam @ApiImplicitParams @Parameters Configuration for multiple RequestParams @ApiModel @Schema description : Human-readable name\ndefaultValue : Default value\nallowableValues : Allowable values (set when enumerable) @ApiModelProperty(hidden = true) @Schema(accessMode = READ_ONLY) @ApiOperation(value = \u0026ldquo;foo\u0026rdquo;, notes = \u0026ldquo;bar\u0026rdquo;) @Operation(summary = \u0026ldquo;foo\u0026rdquo;, description = \u0026ldquo;bar\u0026rdquo;) summary : Brief description of API\ndescription : Detailed description of API\nresponses : List of API responses\nparameters : List of API parameters @ApiParam @Parameter name : Parameter name\ndescription : Parameter description\nin : Parameter location (query, header, path, cookie) @ApiResponse(code = 404, message = \u0026ldquo;foo\u0026rdquo;) @ApiResponse(responseCode = \u0026ldquo;404\u0026rdquo;, description = \u0026ldquo;foo\u0026rdquo;) responseCode : HTTP status code\ndescription : Response description\ncontent : Response payload structure\nschema : Schema used in payload\nhidden : Whether to hide schema\nimplementation : Schema target class When using an object to capture multiple request query params, use the @ParameterObject annotation on that method argument This step is optional: Replace with GroupedOpenApi bean only if you have multiple Docket beans Using @Tag Annotation The @Tag annotation enables the following grouping:\nGrouping by Controller Grouping by method within Controller Grouping in spec file conversion according to the name specified in @Tag File creation with that name when generating client code using OpenAPI Generator Cautions When Using Multiple @Tag Question: What happens if you group with @Tag at the top level and set a different tag name in the @Operation of a lower-level method?\nTest Result When @Tag(name = \u0026quot;swagger\u0026quot;) is set at the top level and tags = {\u0026quot;swagger123\u0026quot;} is added in the @Operation of the postHello method, the same endpoint is created as duplicate groups.\nProblem Using OpenAPI Generator in this state causes the problem of duplicate client code being generated as shown below.\nRecommendation: Unless there\u0026rsquo;s a special case, it\u0026rsquo;s recommended to use @Tag grouping only at the Controller\u0026rsquo;s top level.\nFile Naming for OpenAPI Generator Client Code Generation When generating client code, the name specified in @Tag + -api is added as a postfix. To customize this, you need to modify the Mustache file.\nReferences Using Templates | OpenAPI Generator Mustache.js GitHub OpenAPI Generator Usage Guide Auto-generating Safe Models and Standardized Implementation Code with OpenAPI Generator Authentication-Related OpenAPI Specs OpenAPI supports various authentication methods. Key configuration items are as follows.\ntype (Authentication Format) Currently supports API Key, HTTP, OAuth2, and OpenID Connect methods. Note: OpenAPI v2 spec does not support OpenID Connect method.\nSupported Types\nhttp: Basic, Bearer and other HTTP authentication schemes apiKey: API key and cookie authentication oauth2: OAuth2 authentication openIdConnect: OpenID Connect discovery Key Configuration Items name: Authentication key name (required when using API Key method) in: Specifies authentication key location (choose from query, header, cookie, required when using API Key method) scheme: Specifies authentication method (Basic or Bearer, required when using HTTP authentication) bearerFormat: Bearer token format (commonly JWT) flows: OAuth2 flow type (choose from implicit, password, clientCredentials, authorizationCode) openIdConnectUrl: OpenID Connect URL (recommended to use OAuth2 or Bearer token method as alternative in OpenAPI v2 spec) @Deprecated Strategy When there are changes to DTO specs due to API version updates, use the following phased strategy.\nPhase 1: Mark with @Deprecated First, add the @Deprecated annotation to the field that will change.\npublic class UserDto { @Deprecated private String oldField; private String newField; } The OpenAPI spec will also show deprecated for that schema field, and when generating code on the frontend, the field will be marked as deprecated. This notifies the frontend team in advance that the field will be removed soon.\nPhase 2: Apply @Schema(hidden = true) Once the frontend has completed migration to the new spec, add @Schema(hidden = true) to the @Deprecated field on the server so that the field is no longer generated in the OpenAPI spec.\npublic class UserDto { @Deprecated @Schema(hidden = true) private String oldField; // Excluded from spec private String newField; } Phase 3: Remove Field After sufficient time has passed, completely remove the field.\nThis phased approach enables safe API version management between frontend and backend.\n","date":"2023-10-17T17:12:52+09:00","image":"/posts/231017_swagger/featured.png","permalink":"/en/posts/231017_swagger/","title":"Springdoc and OpenAPI (Annotation Usage Guide)"},{"content":"What is Swagger? Swagger is a framework for OAS (OpenAPI Specification) that allows you to specify and manage the specifications of APIs. Through Swagger, you can design, build, and document REST API services.\nSwagger Tools Swagger provides the following key tools\nSwagger UI: A tool that visualizes Swagger API specifications in HTML format for easy viewing Swagger Codegen: A CLI tool that automatically generates client and server code based on Swagger specifications Swagger Editor: An editor for creating API design documents and specifications according to Swagger standards Springfox vs Springdoc Springfox Swagger is a library that helps you easily write API documentation using Swagger in projects using Spring or Spring Boot.\nWhen Springfox stopped updating, Springdoc emerged and rapidly gained popularity with active updates. Springdoc is also a library that supports Swagger documentation creation and is now the more recommended choice over Springfox.\nSwagger Codegen vs OpenAPI Generator Swagger is a trademark of SmartBear, and Swagger Codegen is a project included within it.\nOpenAPI Generator is a community-driven open-source project that started as a fork of the Swagger Codegen project. Currently, more than 40 top project contributors and founding members of Swagger Codegen are participating together.\nOpenAPI Generator License OpenAPI Generator follows Apache License 2.0.\nWhat is Apache License 2.0? Apache License 2.0 grants the following rights\nAnyone can create programs derived from the software Copyright can be transferred and transmitted Parts or the whole can be used for personal or commercial purposes When redistributing, you don\u0026rsquo;t necessarily have to include the original or modified source code However, you must include the Apache License version and notice (clearly indicating that the software was developed under Apache License) Background of OpenAPI Generator\u0026rsquo;s Birth The official Q\u0026amp;A of OpenAPI Generator reveals the background of the project\u0026rsquo;s creation\nDifference in Version Philosophy The founding members of Swagger Codegen felt that Swagger Codegen 3.0.0 was too different from the 2.x philosophy. They were concerned that the overhead of maintaining two separate branches (2.x, 3.x) could cause problems similar to those experienced by the Python community.\nFaster Release Cycle The founding members wanted a faster release cycle so users wouldn\u0026rsquo;t have to wait months to use the stable release version they wanted.\nWeekly patch releases Monthly minor releases Community-Driven Development Proceeding as a community-driven open-source project ensures innovation, reliability, and a roadmap owned by the community.\nFor these reasons, the OpenAPI Generator project was born. OpenAPI Generator feels somewhat like the relationship between MySQL and MariaDB.\nMigrating from Swagger Codegen According to the official documentation, if you\u0026rsquo;re currently using Swagger Codegen 2.x version, you can conveniently migrate to OpenAPI Generator. This is because OpenAPI Generator is based on Swagger Codegen 2.4.0-SNAPSHOT version.\nFor detailed migration instructions, refer to the official guide\nMigrating from Swagger Codegen | OpenAPI Generator References OpenAPI Generator Official Site Migrating from Swagger Codegen ","date":"2023-10-16T16:56:35+09:00","image":"/posts/231016_swagger/featured.png","permalink":"/en/posts/231016_swagger/","title":"Mastering OpenAPI Generator"},{"content":" What is Nginx? Nginx (engine-x) is a lightweight, high-performance web server software. It functions not only as a web server but also as a reverse proxy, load balancer, and HTTP cache.\nNginx was designed to handle high concurrent connections and is currently used by numerous large-scale websites worldwide.\nWhy Was Nginx Needed? In the past, Apache web server was the industry standard. However, in the early 2000s, as internet users grew exponentially, a bottleneck called the C10k problem emerged.\nThe C10k Problem The C10k problem stands for \u0026ldquo;Connection 10,000,\u0026rdquo; meaning handling 10,000 concurrent client connections on a single server.\nℹ️ Important Concept Distinction\nConcurrent Processing: Maintaining and managing many connections simultaneously Throughput: Number of requests that can be processed per second Concurrent connection handling focuses on efficient resource management and scheduling rather than raw speed.\nApache\u0026rsquo;s Architectural Limitations Traditional Apache had the following structural issues:\n1. Process-Based Processing Creates a new process or thread for each incoming request Number of processes increases proportionally with users Results in memory exhaustion 2. High Resource Consumption Apache\u0026rsquo;s powerful extensibility allows various module additions However, each process loads all modules into memory Memory usage per process increases 3. Context-Switching Overhead CPU cores alternate between multiple processes Context-switching costs occur during process transitions CPU overhead increases with more requests Due to these issues, Apache was unsuitable for large-scale concurrent connection environments.\nThe Birth of Nginx In 2002, Russian developer Igor Sysoev began developing Nginx to solve this problem, releasing the first version in 2004.\nNginx\u0026rsquo;s Core Goals High concurrent connection handling Low memory footprint High performance and stability Nginx\u0026rsquo;s Primary Roles HTTP Server: Quickly serves static files (HTML, CSS, JS, images) Reverse Proxy Server: Relays requests to backend application servers Load Balancer: Distributes traffic across multiple servers Mail Proxy Server: Mail server proxy functionality Nginx Internal Architecture Nginx consists of 1 Master Process and multiple Worker Processes.\nMaster Process Responsibilities The Master Process handles:\nReading and validating configuration files Creating and managing Worker Processes Restarting Worker Processes on configuration changes # Check Master Process ps aux | grep nginx Worker Process Responsibilities Worker Processes handle actual client requests:\n1. Connection Management Receives listen socket from Master Process Forms connections with clients Maintains connections for Keep-Alive duration One Worker handles thousands of connections simultaneously 2. Non-blocking I/O Processes other tasks when no requests on connection Responds immediately when requests arrive Efficient processing via asynchronous Event-Driven approach 3. Thread Pool Delegates time-consuming tasks (file I/O, DB queries) to Thread Pool Worker Process continues handling other requests Minimizes impact of blocking operations 4. CPU Core Optimization Worker Processes are typically created equal to CPU core count Each Worker pinned to specific CPU core (CPU Affinity) Minimizes Context-Switching for performance improvement # nginx.conf configuration example worker_processes auto; # Auto-create based on CPU core count worker_cpu_affinity auto; # Auto-set CPU affinity Event-Driven Architecture Nginx operates with Multi-process + Single-thread + Event-Driven approach:\nEvent Handler manages multiple connections Processes via asynchronous Non-blocking method Executes ready events sequentially Maximizes resource efficiency without idle processes This allows efficient memory and CPU usage without processes waiting idle for requests like Apache.\nNginx Advantages and Disadvantages Advantages 1. High Concurrent Connection Capability 10x more concurrent connections compared to Apache 2x faster processing speed for same connection count 2. Low Resource Usage Operates with fewer processes Minimized memory usage Fast response times with lightweight structure 3. Zero-Downtime Configuration Reload nginx -s reload # Apply configuration without service interruption Master Process reads new configuration Existing Workers finish current requests then terminate New Workers handle requests with new configuration Configuration changes without service interruption 4. Superior Static File Handling Quickly serves static content like images, CSS, JS Better static file performance than Apache Disadvantages 1. Difficult Dynamic Module Development Worker Process restart needed when adding modules Harder module development compared to Apache Partially compensated by Lua scripting 2. Windows Environment Limitations Optimized for Linux/Unix environments Performance and stability degraded on Windows Linux recommended for production environments 3. No .htaccess Support Cannot use Apache\u0026rsquo;s .htaccess files All configuration managed in central config file May lack flexibility in hosting environments Key Nginx Features 1. Reverse Proxy A reverse proxy acts as an intermediary between clients and backend servers.\nKey Benefits Enhanced Security: Hides actual server IP Caching: Caches frequently requested responses Compression: Saves bandwidth by compressing response data SSL Processing: Handles HTTPS encryption/decryption # Reverse proxy configuration example location / { proxy_pass http://backend_server; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } Practical Usage Patterns Nginx + Apache: Nginx handles static files, Apache handles dynamic processing Nginx + Node.js/Python/Java: Nginx protects frontend and backend applications Nginx + Nginx: Hierarchical configuration of multiple Nginx servers 2. Load Balancing Distributes traffic across multiple backend servers to balance load evenly.\nLoad Balancing Algorithms Round Robin (Default) Distributes requests sequentially to each server Simplest and most fair approach upstream backend { server backend1.example.com; server backend2.example.com; server backend3.example.com; } Least Connections Sends to server with fewest current connections Suitable for requests with varying processing times upstream backend { least_conn; server backend1.example.com; server backend2.example.com; } IP Hash Determines server based on client IP hash Useful for Session Persistence upstream backend { ip_hash; server backend1.example.com; server backend2.example.com; } Weight Assigns weight based on server performance Sends more requests to high-performance servers upstream backend { server backend1.example.com weight=3; server backend2.example.com weight=2; server backend3.example.com weight=1; } Health Check upstream backend { server backend1.example.com max_fails=3 fail_timeout=30s; server backend2.example.com max_fails=3 fail_timeout=30s; } max_fails: Number of allowed failures fail_timeout: Time to consider server down Improves availability by automatically excluding failed servers 3. SSL/TLS Termination Nginx handles HTTPS communication with clients and HTTP communication with backend.\nKey Benefits Removes SSL processing burden from backend servers Centralized certificate management Backend focuses on business logic Nginx and backend communicate via HTTP on same internal network (security safe) server { listen 443 ssl http2; server_name example.com; ssl_certificate /path/to/cert.pem; ssl_certificate_key /path/to/key.pem; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5; location / { proxy_pass http://backend; } } HTTP/2 Support Nginx supports HTTP/2:\nMultiplexing: Multiple requests simultaneously over one connection Header Compression: Saves bandwidth Server Push: Sends resources before client requests 4. Caching Stores server responses in memory or disk for fast responses on repeated requests.\n# Cache path configuration proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m max_size=1g; server { location / { proxy_cache my_cache; proxy_cache_valid 200 60m; # Cache 200 responses for 60 minutes proxy_cache_valid 404 10m; # Cache 404 responses for 10 minutes proxy_pass http://backend; } } Caching Strategies Proxy Caching: Cache backend responses FastCGI Caching: Cache dynamic content like PHP-FPM Static File Caching: Set browser cache headers # Static file cache header configuration location ~* \\.(jpg|jpeg|png|gif|ico|css|js)$ { expires 1y; add_header Cache-Control \u0026#34;public, immutable\u0026#34;; } 5. Compression (Gzip) Compress response data to save network bandwidth\ngzip on; gzip_vary on; gzip_min_length 1024; gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml+rss application/json application/javascript; Compress text-based content by 60-80% Improves user experience by reducing transfer time 6. Rate Limiting Defend against DDoS attacks and protect servers\n# Define zone limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s; server { location /api/ { limit_req zone=mylimit burst=20 nodelay; proxy_pass http://backend; } } Limit requests per second per IP burst: Allow sudden traffic spikes Essential for API server protection Nginx vs Apache: Which Should You Choose? Choose Nginx When High concurrent connection handling is needed Static file service is primary purpose Reverse proxy/load balancer is needed Resource efficiency is important Modern protocol support needed (HTTP/2, HTTP/3) Choose Apache When .htaccess file-based configuration is needed Various third-party modules are needed Must use in Windows environment Legacy application compatibility is important Frequent dynamic module development Optimal Combination: Nginx + Apache Many companies use Nginx as frontend and Apache as backend:\n[Client] → [Nginx] → [Apache] → [Application] Static Dynamic SSL PHP/Python Caching Modules Production Tips 1. Worker Connections Configuration events { worker_connections 1024; # Connections per Worker use epoll; # Optimal event model for Linux } 2. Keepalive Optimization http { keepalive_timeout 65; keepalive_requests 100; } 3. Buffer Size Tuning http { client_body_buffer_size 16K; client_header_buffer_size 1k; client_max_body_size 8m; large_client_header_buffers 4 8k; } 4. Log Optimization http { access_log /var/log/nginx/access.log combined buffer=32k; error_log /var/log/nginx/error.log warn; } 5. Security Hardening # Hide version information server_tokens off; # Add security headers add_header X-Frame-Options \u0026#34;SAMEORIGIN\u0026#34; always; add_header X-Content-Type-Options \u0026#34;nosniff\u0026#34; always; add_header X-XSS-Protection \u0026#34;1; mode=block\u0026#34; always; Conclusion Nginx has established itself as a core component of modern web infrastructure. With high performance and efficiency through Event-Driven architecture, it\u0026rsquo;s used by large-scale services like Netflix, Airbnb, and GitHub.\nWhile Apache\u0026rsquo;s stability and extensibility remain valuable, in modern web environments where large-scale traffic handling and resource efficiency are crucial, Nginx is the more suitable choice.\n💡 Recommended Learning Path\nInstall Nginx in local environment and practice basic configuration Set up reverse proxy Configure and test load balancing Apply SSL certificates (Let\u0026rsquo;s Encrypt) Performance monitoring and optimization References Nginx Official Documentation Nginx Configuration Generator Nginx Performance Tuning Guide Important Note: Nginx shows limited performance and compatibility on Windows environments, so it\u0026rsquo;s strongly recommended to use Linux/Unix systems in production!\n","date":"2022-10-25T19:17:04+09:00","image":"/posts/221025_about_nginx/featured.png","permalink":"/en/posts/221025_about_nginx/","title":"What is Nginx? Evolution and Architecture of Web Servers"},{"content":" Following up on the previous post, I\u0026rsquo;d like to organize Docker commands and usage methods =)\nInstalling Docker First, we need to install Docker to use it, right?\nIn my case, I installed Docker in an Ubuntu environment using an AWS EC2 instance.\nIf you need installation instructions for Ubuntu or other environments, please refer to the official documentation below!\nInstall Docker Engine on Ubuntu - docs.docker.com\nRemoving Old Docker Versions and Installing New Version If you want to remove the old version of Docker and install the new version, use the following commands to remove the old version.\nFor Ubuntu\nsudo apt-get remove docker docker-engine docker.io containerd runc Update repository\nsudo apt-get update Install packages to allow apt to use repository over https\nsudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common Add Docker\u0026rsquo;s official GPG key to apt\ncurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - Add Docker repository\nsudo add-apt-repository \u0026#34;deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\u0026#34; Update apt to reflect the changes\nsudo apt-get update Install Docker\nsudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin After completing the Docker installation with the above commands, we need to verify it!\nsudo docker version or\nsudo docker run hello-world When you use the run command as above, it will search for the hello-world image locally, and if it doesn\u0026rsquo;t exist, it will download the image from Docker Hub and run it as a container =)\nDocker Permission Settings If you see the following message when executing Docker commands, don\u0026rsquo;t worry haha\nThis message appears because users other than root don\u0026rsquo;t have permission to use Docker =)\nFirst, check Docker permissions.\ncat /etc/group | grep docker In my case, I\u0026rsquo;ve already added user permissions, so you can see the username \u0026ldquo;ubuntu\u0026rdquo; added at the end.\nIf it\u0026rsquo;s not added, you\u0026rsquo;ll see something like docker:x:999:!\nAdd your user ID to the Docker group\nsudo usermod -aG docker [username] For the username, as shown in the example above, I used \u0026ldquo;ubuntu\u0026rdquo; as my username.\nFor Linux, the default username is typically set to ec2-user =)\nReboot the system.\nsudo reboot Now let\u0026rsquo;s check the version again without sudo before the docker command.\ndocker version If you see both Client and Server information as shown above, the permissions have been successfully granted. =)\nIf you still see the Got permission denied ... message, the permission settings were not properly configured, so please check again and search for any additional error messages!\nIf you share error messages in the comments, I\u0026rsquo;ll help you troubleshoot them too haha\nWindows \u0026amp; Mac OS Docker Desktop + Additionally, Windows and Mac OS support Docker Desktop, which provides an easy installation with a GUI!\nPlease note that it\u0026rsquo;s only free for individual users, companies with fewer than 250 employees, or companies with less than $10 million in revenue!\nFor installation instructions, please refer to the official documentation links below =)\nInstall Docker Desktop on Windows - docs.docker.com\nInstall Docker Desktop on Mac - docs.docker.com\nDocker Commands Docker Image Search Search for images in Docker Hub, Docker\u0026rsquo;s official registry.\nsudo docker search [Image name to search] Download Image sudo docker pull [image name]:[tag] Generally, if you don\u0026rsquo;t specify a tag name when creating an image, the default value \u0026ldquo;latest\u0026rdquo; is attached =)\nPush Image to Docker Hub Account In my case, I\u0026rsquo;ll push the hello-world image I just downloaded to my docker-hub account.\nBefore pushing, let\u0026rsquo;s create a repository called hello-world on docker-hub.\nsudo docker push [docker-hub ID]/[image name]:[tag] Hmm\u0026hellip; the image push failed with the following message. =(\nThe reason is that the repository name on Docker Hub and the local Docker image repository name must match.\nSolution\nsudo docker image tag [image repo name]:[tag] [new image repo name]:[tag] This method doesn\u0026rsquo;t rename the image but copies it and creates a new image with a new name. =)\nIn my case, I didn\u0026rsquo;t specify the tag part separately because the changed repo will also use the default \u0026ldquo;latest\u0026rdquo;.\nLet\u0026rsquo;s try pushing again haha.\nNow you can see that the image has been pushed successfully! =)\nCheck Downloaded Images sudo docker images Run Docker Image as Container docker run -d -i -t --name [container name] -p [host port:container port] [image name or ID] In my case, I already have a Spring Boot project built as a Docker image, so I\u0026rsquo;ll run that image as a container. =)\nGenerally, the -i and -t options are used together as -it =)\nhost port is the external port that users will access after the container is launched, and container port is the port specified when building the Docker image using a Dockerfile. =)\nIn my case, when creating the image, I specified the dev environment among the local, dev, and prod environments in the .yml file, and the server port for that dev environment was set to 8081, so I specified the containerport as 8081!\nFor image name or ID, you can enter the name of the image to run or its ID value. =)\nVarious Docker options are organized below, so please refer to them!\nNow that we\u0026rsquo;ve run the image, we need to check if the container is running properly, right?\nCheck Running Containers sudo docker ps After the container is running, when I access http://[public ip]:8080, I can confirm that the server is running well! =)\nI was also able to confirm through the API I created that the operating environment of the currently running server is the dev1 environment specified in the dockerfile. =)\nPort Mapping Experiment Let me do one more experiment here. As I mentioned earlier, I specified dev1 as the operating environment in the Docker file, and the image built through that dockerfile internally has a server port of 8081.\nSo what happens if I run this image with -p 8080:8080, setting the container port to 8080 instead of 8081??\nFirst, the container launched successfully!\nNow let\u0026rsquo;s try accessing the server\u0026rsquo;s IP and external access port 8080!\nThis time, the container launched successfully, but the server doesn\u0026rsquo;t seem to be working properly =)\nThis confirms that the operating environment settings specified in the dockerfile are working correctly!\nBasic Docker Commands $ sudo docker pull [image name to download]:[tag] $ sudo docker push [docker-hub ID]/[image name]:[tag] $ docker images # View images that exist locally, downloaded via pull or run $ docker run -d -i -t --name [container name] -p [host port:container port] [image name or ID] # Run Docker image as a container. For official images on Docker Hub, # it will automatically download and run them if they don\u0026#39;t exist locally. $ sudo docker ps # Show running containers that are launched from images $ sudo docker ps -a # Show all containers including stopped ones, in addition to running containers $ sudo docker stop [container name or container ID] # Stop currently running container $ sudo docker start [container name or container ID] # Start a stopped container $ sudo docker restart [container name or container ID] # Restart a running container $ sudo docker rm [container name or container ID] # Delete a container # To delete a container, you must first stop the container =) # +tip: When entering container ID, you only need to type 2-3 characters $ sudo docker rmi [image name or image ID] # Delete an image # Similarly, when using ID to delete, you only need to enter 2-3 characters =) $ sudo docker logs [container name or container ID] # View logs of the running container $ sudo docker exec -it [container ID] /bin/bash # Access container internally # To exit: $ exit Docker Command Options Let\u0026rsquo;s take a look at Docker command options!\n-i : --interactive : Activates standard input and keeps standard input open even when not attached to the container. Use this option to enter Bash commands.\n-t : --tty : Use TTY(pseudo-TTY). This option must be set to use Bash; without it, you can enter commands but the shell won\u0026rsquo;t be displayed.\n-d : --detach : Detached mode, also called daemon mode. The container runs in the background.\n-p : --publish : Connect host and container ports. (Port forwarding) ex) -p 80:80\n\u0026ndash;privileged : Use all Linux kernel capabilities (Capability) of the host inside the container. This allows access to the host\u0026rsquo;s main resources.\n\u0026ndash;rm : Automatically remove container when process terminates\n\u0026ndash;restart : Set restart policy when container terminates.\n-v : --volume : Data volume setting that connects host and container directories, so changes made on the host are applied identically inside the container. Concept of synchronization.\n-u : --user : Set the Linux user account name or UID under which the container will run. ex) --user ubuntu\n-e : --env : Set environment variables to use inside the container. Generally used to pass configuration values or passwords.\n\u0026ndash;link : Connect containers. [container name:alias] ex) --link \u0026quot;mysql:mysql\u0026quot;\n-h : --hostname : Set the hostname of the container.\n-w : --workdir : Set the directory where the process inside the container will run.\n-a : --attach : Connect standard input (stdin), standard output (stdout), and standard error (stderr) to the container.\n-c : --cpu-shares : CPU resource allocation setting. Default is 1024, and each value is applied relatively.\n-m : --memory : Set memory limit. ex) --memory=\u0026quot;100m\u0026quot;\n\u0026ndash;gpus : Configure the container to use the host\u0026rsquo;s NVIDIA GPU. To use this method, the host must have: NVIDIA GPU-equipped Linux server + NVIDIA driver installed + Docker version 19.03.5 or higher\n--gpus all : Use all GPUs --gpus \u0026quot;device=0.1\u0026quot; : Use specified GPU \u0026ndash;security-opt : Configure SELinux and AppArmor options. ex) --security-opt=\u0026quot;label:level:TopSecret\u0026quot;\n","date":"2022-10-25T17:59:31+09:00","image":"/posts/221025_docker_command/featured.png","permalink":"/en/posts/221025_docker_command/","title":"Docker Installation \u0026 Command Usage Complete Guide"},{"content":" Docker is an open-source containerization platform that enables applications to run quickly and reliably across different computing environments by packaging code and dependencies. 🐳 What is Docker? Docker\u0026rsquo;s core concepts are broadly divided into two: Container and Image\nDocker Image 💡 A Docker Image is a lightweight, standalone software package that includes code, runtime, system tools, system libraries, and configurations necessary for running an application. Real-World Example If you were to install Jenkins on Linux using the traditional method:\n$ sudo apt-get install jenkins Running this command requires downloading multiple dependency packages together.\nHowever, using Docker:\n$ docker pull jenkins/jenkins:lts You can download a pre-configured image containing all necessary components at once.\n📦 Docker Registry \u0026amp; Docker Hub ℹ️ Docker Registry serves as a repository for sharing Docker images. Think of it as \u0026ldquo;GitHub for Docker.\u0026rdquo; Docker Hub is the official Docker registry, providing official images from vendors.\nWorkflow Users download images from the registry Run images as containers Configure multiple isolated environments on a single computer 🔄 Container Virtualization ✅ Container technology is \u0026ldquo;a server virtualization method that enables running multiple isolated instances within a single system,\u0026rdquo; where each container appears as an individual server to users. Important Note: Containers are not exclusive to Docker. Various container technologies exist, including OpenVZ, Libvirt, and LXC.\n🖥️ Types of Virtualization 1. Host Virtualization Structure: Guest OS runs on top of Host OS through virtualization software.\nExamples: VM Workstation, VMware Player, VirtualBox, etc. 📝 Advantages:\nSimple installation and configuration Minimal host requirements through hardware emulation Disadvantages:\nResource-intensive due to running OS on top of OS Significant performance overhead 2. Hypervisor Virtualization Structure: Software is installed and runs directly on hardware without a Host OS.\nTwo Approaches to Hypervisor Virtualization:\n1) Full Virtualization Guest OS accesses hardware through the hypervisor, not directly More stable but has performance overhead 2) Paravirtualization Guest OS directly accesses hardware through the hypervisor Faster but requires OS modifications 📝 Advantages:\nMore efficient without Host OS Better resource utilization Disadvantages:\nSlow startup time Still consumes significant resources as each VM runs an independent OS 3. Container Virtualization ⭐ Structure: Applications share the host OS kernel while maintaining isolated environments.\n📝 Advantages:\nLightweight: Typically tens of MB (VMs are tens of GB) Fast startup: No need to boot a separate OS Low resource usage: Efficient utilization of system resources High density: Run more containers on the same hardware Disadvantages:\nRequires the same OS environment as the host system Cross-platform deployment can be challenging (e.g., Linux containers require Linux host) 📊 Virtualization Comparison Category Host Virtualization Hypervisor Virtualization Container Virtualization Size Tens of GB Tens of GB Tens of MB Startup Speed Slow Slow Very Fast Resource Usage High Medium Low Isolation Level High High Medium Portability Low Medium High Setup Difficulty Easy Hard Medium 💡 Summary ✅ Core Value of Docker Container Virtualization:\nEfficiency: Provides the same functionality with far fewer resources than traditional virtualization Speed: Start and stop applications in seconds Consistency: Runs identically across development, testing, and production environments Scalability: Easy to add or remove containers as needed Docker is a core tool for modern application development and deployment, serving as the foundation for DevOps and microservices architecture.\n","date":"2022-10-24T00:00:00+09:00","image":"/posts/221024_about_docker/featured.png","permalink":"/en/posts/221024_about_docker/","title":"What is Docker? Docker Container and Types of Virtualization"},{"content":" Private networks and public networks - I\u0026rsquo;ve heard these terms somewhere, but I wanted to organize these concepts since I didn\u0026rsquo;t fully understand them. The relationship between private and public IPs\nℹ️ You might be confused when you first see the diagram above, but after reading the entire article, you\u0026rsquo;ll be able to understand it with an \u0026ldquo;Aha!\u0026rdquo; moment. 📅 2011, IPv4 Address Exhaustion Declared The Internet Assigned Numbers Authority (IANA), which manages internet addresses, declared that there would be no more IPv4 allocations. While IPv4 can use approximately 4.3 billion limited addresses, the rapid increase in internet demand exhausted the IPv4 addresses allocated to each continent.\n💡 IANA (Internet Assigned Numbers Authority) is an organization that manages IP addresses, top-level domains, etc. It is currently managed by ICANN. But How Are We Still Using IPv4? So here we are in 2022, 11 years after IPv4 ran out, and we\u0026rsquo;re still using IPv4 just fine. How is this possible?\nIPv6 was developed long ago and is gradually being commercialized. Nevertheless, IPv4 usage is still much more prevalent, so how has it been maintained well until now, 11 years later?\n✅ This is thanks to Private Networks. 🔌 What is a Private Network? A private network refers to a network that uses a specific range of IPv4 addresses within limited spaces such as homes and businesses, rather than on the public internet. Private IP ranges that belong to private networks can only be used within the private network (internal network), so they cannot be used on the public network (external network, internet).\nPrivate IP ranges\n🌐 What is a Public IP? A public IP is necessary for different PCs to communicate with each other over the internet and is used for purposes such as:\nBuilding website servers PC internet connection Communication via the internet ✅ Each country has an organization that manages public IPs. In Korea, the Korea Internet \u0026amp; Security Agency (KISA) manages them. Public IP address system\n💡 Concept Summary ℹ️ Private networks can only be used within limited spaces such as homes or businesses. So how do we communicate with other PCs that don\u0026rsquo;t use the same private network as us?\nWe need a public IP!\nIn other words, special measures are needed to communicate with the public internet from a private network. Private IPs are regulated to be used only within private networks, so private IPs cannot be used on the public internet.\n🔄 NAT (Network Address Translation) To address this, Network Address Translation (NAT) was devised as a method to convert IP addresses.\n💡 What is NAT?\nIt refers to a technology that sends and receives network traffic through a router while rewriting TCP/UDP port numbers and source and destination IP addresses of IP packets. Since changes occur in packets, IP and TCP/UDP checksums must also be recalculated and rewritten.\nThe reason for using NAT is usually to allow multiple hosts belonging to a private network to access the internet using a single public IP address.\nIn other words, it means converting to the IP used in the public/private network when communicating from a private network to a public network and vice versa. According to the above explanation, converting TCP/UDP port numbers of IP packets is actually because NAT includes not only IP addresses but also port conversion!\nIt\u0026rsquo;s called PAT or NAPT (Port Address Translation).\n📡 Router Functions These days, most homes have routers installed and in use (e.g., iptime, olleh, etc.).\nThese routers have various functions.\n1. DHCP Server Function First, there\u0026rsquo;s a DHCP (Dynamic Host Configuration Protocol) server function that assigns IPs to various devices connected through a single router.\n💡 Dynamic Host Configuration Protocol (DHCP)\nDHCP is an IP standard that simplifies host IP configuration management. It provides a method to dynamically assign IP addresses and other related configuration details to DHCP-enabled clients on the network using a DHCP server.\nThrough this, smart devices and PCs inside the house connected to the router are each assigned a private IP.\n⚠️ Why are they assigned private IPs?\nIf you go back to the very first explanation, you\u0026rsquo;ll understand\u0026hellip;?!\nSince the number of IP allocations is limited, we can\u0026rsquo;t assign a public IP to every home, or rather, every device, so we assign private IPs to build a private network! By building a private network this way, communication is possible internally, but we still can\u0026rsquo;t communicate with the external internet.\n2. NAT Function That\u0026rsquo;s why routers have a NAT function.\nFunction to convert private IPs to public IPs Build their own mapping table and manage pre-conversion and post-conversion values with a NAT table ✅ Of course, the router doesn\u0026rsquo;t have its own public IP! The router uses the public IP range provided by internet service providers (KT, SKT, LG, etc.)! 🛡️ What is a VPN (Virtual Private Network)? Going further, let\u0026rsquo;s learn about VPNs, which we may have used but don\u0026rsquo;t know exactly what role they play!\nVPN stands for Virtual Private Network, which, as the name suggests, is a private network but a virtual one. 🔥 The VPN I knew was something that changes IPs or fakes IPs for illegal purposes\u0026hellip; 🤔\nI thought it was something like that, but it\u0026rsquo;s half right and half wrong!\nThe True Meaning of VPN VPN refers to being able to use an external computer as if it were connected to an internal network (private network).\nThe reason why the IP changes when using VPN can also be understood if you think carefully about private/public networks mentioned above.\n✅ The IP changes because you\u0026rsquo;ve connected to the internal network (private network) through VPN! 💼 VPN Use Cases 1. Remote Work/Telecommuting Through this, companies with private networks set up VPN servers, and through external public IP addresses and configured IDs/passwords, you can access the company\u0026rsquo;s private network from anywhere.\n2. Remote Computer Access Similarly, for personal computers, through VPN setup, if you know the external public IP address, you can access your computer in Seoul from Jeju Island through VPN from anywhere.\n3. Bypassing Geographical Restrictions When a website in a certain country blocks access from our country\u0026rsquo;s IP, we cannot access that site. To access this site, we need to approach with an IP address from a country other than ours. At this time, through VPN, we can bypass the blocked firewall as if we\u0026rsquo;re accessing from an internal network in another country.\n4. Firewall Bypass Mechanism ⚠️ Hypothetical Scenario\nIf a company blocks access to SNS during work hours as an internal policy, we connect through VPN set up at home or an overseas VPN. Then we can access SNS.\nWhy does this work?\nThe moment you connect to VPN, a virtual tunnel is formed, and packets sent for communication between tunnels are broken down into smaller pieces and undergo encryption and encapsulation. At this time, although it passes through the company\u0026rsquo;s firewall, because it\u0026rsquo;s an encrypted/encapsulated packet, the firewall cannot detect that you\u0026rsquo;re trying to access SNS through VPN, so it lets the packet pass through.\nVPN tunneling structure\n📋 VPN Summary 👍 Advantages ✅ 🔒 Data security 🔒 Online privacy protection 📍 IP address change 🛡️ Personal protection 🚀 Bandwidth throttling prevention 👎 Disadvantages While VPN has many advantages as mentioned above, it also has disadvantages.\n⚠️ 🐢 Devices connected to VPN must communicate with the VPN server using encryption, so network speed is very slow ⚠️ Some VPNs with low reliability exist 💰 You must pay to use VPNs with high security 🚫 Not available in some countries ","date":"2022-10-05T17:34:36+09:00","image":"/posts/221005_about_ip/featured.jpg","permalink":"/en/posts/221005_about_ip/","title":"Private IP/Public IP? Private Network/Public Network? VPN?"},{"content":"What is Dispatcher Servlet? The term dispatch in Dispatcher Servlet means \u0026ldquo;to send\u0026rdquo;. The Dispatcher Servlet can be defined as a Front Controller that receives all incoming HTTP protocol requests first and delegates them to the appropriate controller.\nOperation Overview The more detailed process is as follows:\nWhen a request comes from the client, a servlet container such as Tomcat receives the request All these requests are received first by the Dispatcher Servlet, which is the Front Controller The Dispatcher Servlet processes common tasks first and then finds the controller that should handle the request and delegates the work Front Controller Pattern The term Front Controller refers to a controller that receives and processes all client requests coming to the server at the front of the servlet container, and it is a design pattern used together with the MVC architecture.\nHow Dispatcher Servlet Works The Dispatcher Servlet is the Front-Controller that receives requests first.\nIt passes through filters in the Servlet Context (Web Context) The Dispatcher Servlet receives the request first in the Spring Context The Dispatcher Servlet must find the appropriate controller and method to delegate the request, and the operation process is as follows.\nDetailed Operation Process 1. HTTP Request passes through Filter and is received by Dispatcher Servlet 2. Check request information and find the Controller to delegate RequestMappingHandlerMapping, one of the implementations of HandlerMapping, parses all controller beans written with @Controller and manages (request information, processing target) as a HashMap.\nIt finds the HandlerMethod object that contains the controller and method mapped to the request. Therefore, when a request comes in, HandlerMapping creates a Key object (request information) using HTTP Method, URI, etc., finds the HandlerMethod to process the request as Value, wraps it in HandlerMethodExecutionChain, and returns it.\nThe reason for this wrapping is to include interceptors that need to be processed before passing the request to the controller.\n3. Find and pass the HandlerAdapter to delegate to the Controller The Dispatcher Servlet does not delegate requests directly to the controller, but delegates them through HandlerAdapter.\nThe reason for going through the HandlerAdapter interface is that there are various ways to implement controllers. While controller classes are mainly written using @Controller with @RequestMapping related annotations, controller classes can also be written by implementing the Controller interface.\nTherefore, Spring applies the adapter pattern through the HandlerAdapter interface, allowing requests to be delegated to Controllers regardless of the controller implementation method.\n4. HandlerAdapter delegates the request to the Controller Common pre/post processing is required before HandlerAdapter passes the request to the Controller.\nTypically:\nInterceptor processing ArgumentResolver to handle @RequestParam, @RequestBody, etc. in requests ReturnValueHandler that handles processing such as serializing the Body of ResponseEntity to JSON in responses These processes are handled before being passed from the adapter to the controller. Then it delegates the request to invoke the controller\u0026rsquo;s method.\n5. Process Business Logic The Controller calls the service and proceeds with business logic.\n6. Controller returns the return value Returns ResponseEntity or View name.\n7. HandlerAdapter processes the return value HandlerAdapter returns the response received from the controller to the Dispatcher Servlet after post-processing by the ReturnValueHandler, the response processor.\nIf the controller returns ResponseEntity → HttpEntityMethodProcessor uses MessageConverter to serialize the response object and set the response status (HttpStatus) If View name is returned → View is returned through ViewResolver 8. Send the server\u0026rsquo;s response to the client The response returned through DispatcherServlet passes through the Filter again and is returned to the client.\n","date":"2022-09-27T23:10:15+09:00","image":"/posts/220927_dispatcher/featured.png","permalink":"/en/posts/220927_dispatcher/","title":"Understanding Spring Dispatcher Servlet"},{"content":"JVM Components 1. Class Loader JVM\u0026rsquo;s Class Loader loads *.class files (bytecode files converted by javac) into Runtime Data Areas to run the program.\n💡 Class Loader loading occurs at runtime, when a class is first accessed. This enables Lazy Loading Singleton implementation.\nClass Loading is Thread-safe. 2. Execution Engine Executes the bytecode loaded into Runtime Data Areas by the Class Loader. It converts bytecode to machine code and executes it instruction by instruction, composed of 1-byte OpCode and operands.\nKey Components Interpreter Compiler (Just-in-Time) 3. Garbage Collector Responsible for removing unreferenced objects in the Heap area.\nBefore Java, programmers managed all program memory. In Java, the JVM manages program memory through a process called garbage collection.\nℹ️ Garbage collection continuously finds and removes unused memory in Java programs. 4. Runtime Data Areas JVM\u0026rsquo;s memory area allocated from the OS. It holds data needed to run Java applications.\nRuntime Data Areas are divided into 5 areas as follows.\nℹ️ Shared Areas\nMethod and Heap areas are shared by all Threads Thread-specific Areas\nStack, PC Register, Native Method areas exist per Thread (1) Method Area Created when the JVM starts, it stores runtime constant pools, field and method code, static variables, method bytecode, etc. for each class and interface read by the JVM.\n💡 It\u0026rsquo;s a Non-Heap area stored in the Permanent area. This is a factor to consider when specifying the PermSize (Permanent Generation size) JVM option. 1-1 Type Information Whether it\u0026rsquo;s an Interface Type name including package name Type access modifiers Associated Interface list 1-2 Runtime Constant Pool Stores all references to Type, Field, and Method JVM finds and references memory addresses through the Runtime Constant Pool 1-3 Field Information Field type Field access modifiers 1-4 Method Information Stores metadata for all Methods including Constructors Stores Method name, parameter count and type, return type, access modifiers, bytecode, local variable section size, etc. 1-5 Class Variable Stores variables declared with the static keyword Actual instances of non-primitive static variables are stored in Heap memory (2) Heap Area Space for storing objects created with the new operator.\nℹ️ Objects become targets for GC (Garbage Collector) when no referencing variables or fields exist. (3) Stack Area Stored as separate Frames per Thread, with the following elements stored.\n3-1 Local Variable Area Stores temporary data occurring during Method execution, such as local variables, parameters, and method call addresses Stored in 4-byte units: 4-byte primitives like int, float occupy 1 cell; 8-byte primitives like double occupy 2 cells bool generally occupies 1 cell 3-2 Operand Stack The Method\u0026rsquo;s workspace Indicates which commands to execute with which operands 3-3 Frame Data Includes Constant Pool Resolution, Method Return, Exception Dispatch, etc. Also holds referenced Exception tables When an Exception occurs, JVM references this table to determine how to handle the Exception (4) PC Register Created when a Thread starts, one exists per thread.\nHolds the address of the currently executing portion of the Thread, recording which part should be executed with which instruction.\n💡 The OS references the PC (Program Counter) Register to know which instruction the Thread should execute next during CPU scheduling. (5) Native Method Stack Area for executing programs written in actual executable machine code rather than bytecode generated from Java program compilation.\nSpace for code written in languages other than Java Converted to bytecode and stored through the Java Native Interface Area where the kernel grabs the stack and independently executes the program, like a general program JVM Execution Order Memory Allocation\nWhen a program runs, JVM receives memory needed to execute the program from the OS JVM divides this memory into multiple areas for use Compilation\nJava Compiler (javac) compiles *.java files and converts them to *.class Java bytecode Class Loading\nCompiled *.class files are loaded into JVM memory through the Class Loader Bytecode Interpretation\nLoaded *.class files are interpreted into machine code through the Execution Engine Execution and Management\nInterpreted bytecode is placed in memory areas and actually executed During execution, JVM performs memory management tasks such as thread synchronization and garbage collector as needed ","date":"2022-09-23T22:27:28+09:00","image":"/posts/220923_jvm_2/featured.png","permalink":"/en/posts/220923_jvm_2/","title":"Deep Dive into JVM (Java Virtual Machine) (2)"},{"content":"While studying the Java language, I became curious about the JVM. I only knew it as a virtual computer that executes the code I write, so I wanted to dig deeper into how it works and what role it plays.\nWhat is JVM? An abbreviation for Java Virtual Machine, it refers to a virtual computer environment for running Java.\nSo what role does JVM play? Java is not OS-dependent.\nTo meet this condition and execute the code we write, something is needed between Java and the OS.\nThat\u0026rsquo;s the JVM.\nCode Execution Process Source code (raw code) *.java must be converted to machine code (010101000101\u0026hellip;) for the CPU to recognize it.\nSo does *.java get directly converted to machine code and executed\u0026hellip;? No. The *.java file is first converted to java bytecode (*.class) so that the JVM can recognize it.\nThis conversion process is performed by the java compiler.\nℹ️ The java compiler is javac.exe located in the bin folder when you install JDK.\nYou can generate .class files using the javac command You can execute these .class files using the java command So now it runs on the OS..? ⚠️ No\u0026hellip;. bytecode is not machine code, so it doesn\u0026rsquo;t run directly on the OS\u0026hellip;! At this point, the JVM plays the role of interpreting this bytecode so the OS can understand it.\nThanks to this role of the JVM, Java code written once can be executed regardless of the OS.\nOverall Process *.java → converted to bytecode form *.class → converted to machine code (binary code) through JIT (Just In Time) compiler\nWhat is JIT (Just In Time) Compiler? Also called JIT compilation or dynamic translation.\nJIT was introduced to complement the shortcomings of the interpreter approach.\nIt translates to machine code at the actual execution time of the program.\nPerformance Characteristics 💡 Since machine code is stored in cache, code that has been compiled once executes quickly. The process of JIT compiler compiling to machine code is much slower than interpreting bytecode, but once executed, it\u0026rsquo;s fast afterwards However, for code that runs only once, it\u0026rsquo;s advantageous to interpret directly without compiling JVM using JIT compiler checks how often a method is executed and only compiles when it exceeds a certain threshold.\nWhat is the Interpreter Approach? An interpreter translates source code line by line to machine code at each execution, so execution speed is slower than statically compiled languages.\nRepresentative Interpreter Languages Python JavaScript Database language SQL Advantages and Disadvantages Category Description Advantages Simple program modification Disadvantages Execution speed is slower than compiled languages 💡 Compilers translate source code to create executable files, so when program modifications occur, the source code must be recompiled.\nIf the program is small and simple, there\u0026rsquo;s no problem, but as the program grows larger, compilation often takes hours.\nHowever, with an interpreter, you just modify the source code and run it, so it\u0026rsquo;s widely used in programming where modifications occur frequently.\n","date":"2022-09-22T22:06:50+09:00","image":"/posts/220922_jvm_1/featured.png","permalink":"/en/posts/220922_jvm_1/","title":"Deep Dive into JVM (Java Virtual Machine) (1)"},{"content":"Object-Oriented Programming (OOP) vs Procedural Programming (PP) Object-oriented languages and procedural languages are not opposing concepts. So what exactly are object-oriented and procedural languages?\nWe typically refer to languages like Java, Python, and C# as object-oriented languages, while C is called a procedural language. However, this merely indicates what these languages orient toward - it doesn\u0026rsquo;t mean C can only do procedural programming or that Java and Python can only do object-oriented programming.\nRegardless of which language you use, you can write procedural code. Conversely, you can write object-oriented code even in C.\nThe Misconception of \u0026ldquo;Procedural-Oriented\u0026rdquo; In fact, calling something a procedural-oriented language is incorrect. All programming languages are based on procedures, so saying they \u0026ldquo;orient toward\u0026rdquo; procedures doesn\u0026rsquo;t make sense.\nTo use an analogy:\nIt\u0026rsquo;s like saying weightlifting is a sport that orients toward barbells, when in reality it\u0026rsquo;s a sport based on using barbells. Should we do weightlifting with dumbbells instead\u0026hellip;? In other words, the correct term is \u0026lsquo;Procedural Programming\u0026rsquo;, not \u0026lsquo;Procedural-Oriented\u0026rsquo;.\n💡 Object-Oriented Programming (OOP) and Procedural Programming (PP) simply represent different approaches to programming - they are not opposing concepts! Key Differences Procedural Programming: Creates functions centered around data Object-Oriented Programming: Bundles data and functions (behaviors) together into objects Criteria for Distinguishing Procedural and Object-Oriented Languages There are various ways to distinguish them, but broadly speaking, they can be categorized as follows:\nDoes it support encapsulation, polymorphism, and class inheritance? Can it restrict data access? Generally, languages that satisfy these criteria are considered to have stronger object-oriented characteristics.\nProcedural Programming Procedural programming literally means structuring code procedurally.\nIt\u0026rsquo;s an approach where you identify the sequence of data operations and create functions for necessary features, executing them procedurally (in order).\nObject-Oriented Programming Object-oriented programming bundles functionalities into objects.\nIn other words, you create individual objects, each bundling the behaviors (functions) and data they can handle.\nExample Imagine implementing a ride-hailing service:\nCar Object: Bundles all the behaviors (functions) a car can perform Driver Object: Bundles all the behaviors a driver can perform Passenger Object: Bundles all the behaviors a passenger can perform The algorithm is constructed through interactions between these objects by calling their methods and fields.\nSo Which Approach is Better? ℹ️ There\u0026rsquo;s no definitive answer. Use what fits your needs and your preferred style. Programming in the Past In the past, we didn\u0026rsquo;t need hardware and software on the scale we do today. Old languages like C, Fortran, and COBOL - representative procedural languages - were widely used.\nModern Programming As we entered the modern era, software development accelerated and code became increasingly complex.\nThis led to tangled algorithms, and code became difficult or impossible for humans to understand - resulting in spaghetti code.\nObject-oriented programming emerged as an alternative to address these issues.\nWhy is Object-Oriented Programming Dominant? Currently, object-oriented programming is predominantly used. The reasons are:\nFor complex programs, using procedural programming makes code more prone to tangling In terms of scalability, it offers fewer advantages for maintenance Pros and Cons of Procedural Programming Pros Program directly without creating objects or classes Create functions for needed features to call and reuse instead of copy-pasting Easy to trace program flow Cons Difficult to modify due to tight coupling between code sections (high cohesion makes additions and modifications difficult) Difficult to debug (error checking) Pros and Cons of Object-Oriented Programming Pros Easier maintenance through modularization and encapsulation Code is easier to understand due to similarity with the real world Objects themselves are self-contained programs that can be reused in other programs Cons Most object-oriented programs tend to be relatively slower and use more memory Requires significant time in the design phase to make code understandable through real-world analogies There\u0026rsquo;s No Right Answer! Use the Right Tool for the Job When to Use Procedural Programming Typically used when the project scope is small and there\u0026rsquo;s little need for code reuse.\nBenefits:\nThe program itself is lighter Requires less development time and personnel compared to object-oriented approach When to Use Object-Oriented Programming For large-scale projects where code needs to be reused, object-oriented programming is suitable (excluding initial development costs).\nBenefits:\nMore stable from a maintenance perspective Conclusion ⚠️ Today we explored object-oriented programming and procedural programming. While I don\u0026rsquo;t yet have deep knowledge of these topics, by researching various sources, I\u0026rsquo;ve gained a broad understanding of object-oriented and procedural programming. Next time, I\u0026rsquo;ll dive deeper into these concepts! ","date":"2022-08-31T20:02:58+09:00","image":"/posts/220831_about_oop/featured.png","permalink":"/en/posts/220831_about_oop/","title":"Understanding Object-Oriented and Procedural Programming"}]