Veri Ön İşleme 1 : Veri Temizleme (Veri Madenciliği Teorik 2)

Data Quality: Multidimensional data quality criteria: Why preprocess? Accuracy: Correct and incorrect data Completeness: Unrecorded or inaccessible data Consistency: Some data is outdated, dangling Timeliness Believability Interpretability: How easily the data can be understood Real-life data is messy: It can experience numerous machine, human, or computer errors and transmission disruptions. Incomplete Data: Missing some attributes (data), only aggregate data is available. e.g., Occupation = " " (not entered). Noisy Data: Noise, error, or outlier data. e.g., Salary = "−10" (error). Inconsistent Data: Different data from different sources. Age = "42", Date of Birth = "03/07/2010" Old grading: "1, 2, 3", new grading: "A, B, C" Discrepancies in duplicate records. Intentional Problems: January 1st is recorded for everyone whose birth date is unknown. Data is not always accessible. e.g., some records were not captured. Customer income levels were not recorded during the sale. Missing data generally occurs in the following situations: Hardware failures Data deleted due to incompatibility Unclear data not entered Data not prioritized during data entry Changes to data not recorded Missing data must be resolved Omission: Missing data is not processed and is treated as if it did not exist. The effects on the results should be known depending on the VM method used. Manually filling in missing data: Not always possible and can be very time-consuming and costly. Automatically filling in missing data Creating a new class for all missing data (such as "unknown") Putting in the mean Putting in means by class Bayesian formula and decision tree application Noise: randomly generated values in the measurement Incorrect feature values can occur in the following situations: Errors in data collection tools Data entry problems Data transmission problems Technology limitations Naming inconsistencies Other situations requiring data cleaning Duplicate records Missing data Inconsistent data Binning Data is sorted and divided into equally frequent packets. Missing data is filled using different methods: Mean Median Boundary Regression Inserting missing data using regression functions Segmentation (Clustering) Finding and cleaning outliers Joint use of computer and human knowledge Detect suspicious values and check by humans (e.g., deal with possible outliers) Capturing differences in data Using metadata (e.g., domain, range, dependency, distribution) Field Overloading Rule checks on data (unique, consecutive, null) Using commercial software Data scrubbing: Checking simple field information using rules (e.g., postal code, spell-check) Data auditing: Extracting rules from data and identifying those that do not comply with the rules (e.g., finding outliers through correlation or clustering) Data Migration and Integration Data Migration Tools: Allows data transformation ETL (Extraction/Transformation/Loading) Tools: Allows for managing transformations, usually through a graphical interface Integrated execution of two different tasks Iterative/Interactive (e.g., Potter's Wheels) Cleaning Overloaded Areas Chaining Coupling Multipurpose Şadi Evren ŞEKER

Knime ve Web Madenciliğine Giriş (Veri Bilimi Eğitim Serisi 45. Video)

Knime ve Web Madenciliğine Giriş (Veri Bilimi Eğitim Serisi 45. Video)

Veri Madenciliği (Hiç Bilmeyenler için)

Veri Madenciliği (Hiç Bilmeyenler için)

Büyük Veri (Big Data) Kavramı ve Büyük veri yaşam döngüleri

Büyük Veri (Big Data) Kavramı ve Büyük veri yaşam döngüleri

Emine Mükemmel Sunum Yaptı, Şeflerin Yorumu Duygulandırdı | MasterChef Türkiye 27.06.2026

Emine Mükemmel Sunum Yaptı, Şeflerin Yorumu Duygulandırdı | MasterChef Türkiye 27.06.2026

Metin Madenciliği- Named Entitiy Recognition

Metin Madenciliği- Named Entitiy Recognition

40Hz Binaural Gamma Waves - Ultra Deep Concentration

40Hz Binaural Gamma Waves - Ultra Deep Concentration

PINK & ORANGE GRADIENT IN HD [3 HOURS]

PINK & ORANGE GRADIENT IN HD [3 HOURS]

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

What do tech pioneers think about the AI revolution? - The Engineers, BBC World Service

Büyük Taarruz Olmasaydı Ne Olurdu? | Celal Şengör İle Olmasaydı Ne Olurdu

Büyük Taarruz Olmasaydı Ne Olurdu? | Celal Şengör İle Olmasaydı Ne Olurdu

Sınıflandırma Classification 1 (Veri Madenciliği Teorik 3)

Sınıflandırma Classification 1 (Veri Madenciliği Teorik 3)

But what is a neural network? | Deep learning chapter 1

But what is a neural network? | Deep learning chapter 1

Salesforce Tutorial For Beginners | Introduction To Salesforce | Salesforce Training | Simplilearn

Salesforce Tutorial For Beginners | Introduction To Salesforce | Salesforce Training | Simplilearn

Yapay Zekanın Yeni Kralı: Claude ve "Vibe Coding", Hayatınızı Nasıl Değiştirecek?

Yapay Zekanın Yeni Kralı: Claude ve "Vibe Coding", Hayatınızı Nasıl Değiştirecek?

The Complete Cardiology Masterclass: Exam-Ready in One Video

The Complete Cardiology Masterclass: Exam-Ready in One Video

Veri Madenciliği Sohbeti - 1

Veri Madenciliği Sohbeti - 1

The Greatest Sign That Allah Is Not Pleased With a Servant – The 1st Rule of Sincerity

The Greatest Sign That Allah Is Not Pleased With a Servant – The 1st Rule of Sincerity

Decision Trees Classification 3 (Data Mining Theory 5)

Decision Trees Classification 3 (Data Mining Theory 5)

Excel for Finance and Accounting Full Course Tutorial (3+ Hours)

Excel for Finance and Accounting Full Course Tutorial (3+ Hours)

See How a 453kg Giant Bluefin Tuna Is Flawlessly Carved in Seconds

See How a 453kg Giant Bluefin Tuna Is Flawlessly Carved in Seconds